Single nucleotide polymorphisms and their use in genetic analysis

ABSTRACT

Molecules and methods suitable for identifying polymorphic sites in the genome of a plant or animal. The identification of such sites is useful in determining identity, ancestry, predisposition to genetic disease, the presence or absence of a desired trait, etc.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation-in-part of U.S. patentapplication Ser. No. 08/145,145 (filed Nov. 3, 1993).

FIELD OF THE INVENTION

[0002] The present invention is in the field of recombinant DNAtechnology. More specifically, the invention is directed to moleculesand methods suitable for identifying single nucleotide polymorphisms inthe genome of an animal, especially a horse or a human, and using suchsites to analyze identity, ancestry or genetic traits.

BACKGROUND OF THE INVENTION

[0003] The capacity to genotype an animal, plant or microbe is offundamental importance to forensic science, medicine and epidemiologyand public health, and to the breeding and exhibition of animals. Such acapacity is needed, for example, to determine the identity of thecausative agent of an infectious disease, to determine whether twoindividuals are related, or to establish whether a particular animalsuch as a horse is a thoroughbred.

[0004] The analysis of identity and parentage, along with the capacityto diagnose disease is also of central concern to human, animal andplant genetic studies, particularly forensic or paternity evaluations,and in the evaluation of an individual's risk of genetic disease. Suchgoals have been pursued by analyzing variations in DNA sequences thatdistinguish the DNA of one individual from another.

[0005] If such a variation alters the lengths of the fragments that aregenerated by restriction endonuclease cleavage, the variations arereferred to as restriction fragment length polymorphisms (“RFLPs”).RFLPs have been widely used in human and animal genetic analyses(Glassberg, J., UK patent Application 2135774; Skolnick, M. H. et al.,Cytogen. Cell Genet. 32:58-67 (1982); Botstein, D. et al., Ann. J. Hum.Genet. 32:314-331 (1980); Fischer, S. G et al. (PCT ApplicationWO90/13668); Uhlen, M., PCT Application WO90/11369)). Where a heritabletrait can be linked to a particular RFLP, the presence of the RFLP in atarget animal can be used to predict the likelihood that the animal willalso exhibit the trait. Statistical methods have been developed topermit the multilocus analysis of RFLPs such that complex traits thatare dependent upon multiple alleles can be mapped (Lander, S. et al.,Proc. Natl. Acad. Sci. (U.S.A.) 83:7353-7357 (1986); Lander, S. et al.,Proc. Natl. Acad. Sci. (U.S.A.) 84:2363-2367 (1987); Donis-Keller, H. etal., Cell 51:319-337 (1987); Lander, S. et al., Genetics 121:185-199(1989), all herein incorporated by reference). Such methods can be usedto develop a genetic map, as well as to develop plants or animals havingmore desirable traits (Donis-Keller, H. et al., Cell 51:319-337 (1987);Lander, S. et al., Genetics 121:185-199 (1989)).

[0006] In some cases, the DNA sequence variations are in regions of thegenome that are characterized by short tandem repeats (STRs) thatinclude tandem di- or tri-nucleotide repeated motifs of nucleotides.These tandem repeats are also referred to as “variable number tandemrepeat” (“VNTR”) polymorphisms. VNTRs have been used in identity andpaternity analysis (Weber, J. L., U.S. Pat. No. 5,075,217; Armour, J. A.L. et al., FEBS Lett. 307:113-115 (1992); Jones, L. et al., Eur. J.Haematol. 39:144-147 (1987); Horn, G. T. et al., PCT ApplicationWO91/14003; Jeffreys, A. J., European Patent Application 370,719;Jeffreys, A. J., U.S. Pat. No. 5,175,082); Jeffreys. A. J. et al., Amer.J. Hum. Genet. 39:11-24 (1986); Jeffreys. A. J. et al., Nature 316:76-79(1985); Gray, I. C. et al., Proc. R. Acad. Soc. Lond. 243:241-253(1991); Moore, S. S. et al., Genomics 10:654-660 (1991); Jeffreys, A. J.et al., Anim. Genet. 18:1-15 (1987); Hillel, J. et al., Anim. Genet.20:145-155 (1989); Hillel, J. et al., Genet. 124:783-789 (1990)) and arenow being used in a large number of genetic mapping studies.

[0007] A third class of DNA sequence variation results from singlenucleotide polymorphisms (SNPs) that exist between individuals of thesame species. Such polymorphisms are far more frequent than RFLPs, STRsand VNTRs. In some cases, such polymorphisms comprise mutations that arethe determinative characteristic in a genetic disease. Indeed, suchmutations may affect a single nucleotide in a protein-encoding gene in amanner sufficient to actually cause the disease (i.e. hemophilia,sickle-cell anemia, etc.). In many cases, these SNPs are in noncodingregions of a genome. Despite the central importance of suchpolymorphisms in modern genetics, no practical method has been developedthat permits the use of highly parallel analysis of many SNP alleles intwo or more individuals in genetic analysis.

[0008] The present invention provides such an improved method. Indeed,the present invention provides methods and gene sequences that permitthe genetic analysis of identity and parentage, and the diagnosis ofdisease by discerning the variation of single nucleotide polymorphisms.

SUMMARY OF THE INVENTION

[0009] The present invention is directed to molecules that comprisesingle nucleotide polymorphisms (SNPs) that are present in mammalianDNA, and in particular, to equine and human genomic DNA polymorphisms.The invention is directed to methods for (i) identifying novel singlenucleotide polymorphisms (ii) methods for the repeated analysis andtesting of these SNPs in different samples and (iii) methods forexploiting the existence of such sites in the genetic analysis of singleanimals and populations of animals.

[0010] The analysis (genotyping) of such sites is useful in determiningidentity, ancestry, predisposition to genetic disease, the presence orabsence of a desired trait, etc. In detail, the invention provides anucleic acid primer molecule having a polynucleotide sequencecomplementary to an “invariant” nucleotide sequence of a genomic DNAsegment of a mammal, the genomic segment being located immediately3′-distal to a single nucleotide polymorphic site, X, of a singlenucleotide polymorphic allele of the mammal; and whereintemplate-dependent extension of the nucleic acid primer molecule by asingle nucleotide extends the primer molecule by a single nucleotide,the single nucleotide being complementary to the nucleotide, X, of thesingle nucleotide polymorphic allele. The invention particularlyconcerns the embodiment wherein the mammal is selected from the groupconsisting of humans, non-human primates, dogs, cats, cattle, sheep, andhorses.

[0011] The invention particularly concerns the embodiments wherein themammal is a horse, and wherein the nucleic acid molecule has anucleotide sequence selected from the group consisting of SEQ ID NO:(2n+1) [refer to Table 1], wherein n is an integer selected from thegroup consisting of 0 through 35, or wherein the sequence of theimmediately 3′-distal segment includes a sequence selected from thegroup consisting of SEQ ID NO: (2n+2), wherein n is an integer selectedfrom the group consisting of 0 through 35.

[0012] The invention also provides a nucleic acid molecule having asequence complementary to a sequence selected from the group consistingof SEQ ID NO: 1 through SEQ ID NO: 72. The invention also provides a setof at least two of such nucleic acid molecules.

[0013] The invention also provides a set of at least two nucleic acidmolecules, wherein at least one of the nucleic acid molecules has asequence complementary to a sequence selected from the group consistingof SEQ ID NO: 1 through SEQ ID NO: 72.

[0014] The invention also provides a method for determining the extentof genetic similarity between DNA of a target horse and DNA of areference horse, which comprises the steps:

[0015] A) determining, for a single nucleotide polymorphism of thetarget horse, and for a corresponding single nucleotide polymorphism ofthe reference horse, whether the polymorphisms contain the same singlenucleotide at their respective polymorphic sites; and

[0016] B) using the comparison to determine the extent of geneticsimilarity between the target horse and the reference horse.

[0017] The invention also concerns the embodiment of such method whereinthe polymorphic sites are flanked by (1) an immediately 5′-proximalsequence selected from the group consisting of SEQ ID NO: (2n+1), and(2) an immediately 3′-distal sequence selected from the group consistingof SEQ ID NO: (2n+2); wherein n is an integer selected from the groupconsisting of 0 through 35.

[0018] The invention particularly concerns the embodiment wherein, instep A, the determination is accomplished by a method having thesub-steps:

[0019] (a) incubating a sample of nucleic acid containing the singlenucleotide polymorphism of the target horse, or the single nucleotidepolymorphism of the reference horse, in the presence of a nucleic acidprimer and at least one dideoxynucleotide derivative, under conditionssufficient to permit a polymerase mediated, template-dependent extensionof the primer, the extension causing the incorporation of a singledideoxynucleotide to the 3′-terminus of the primer, the singledideoxynucleotide being complementary to the single nucleotide of thepolymorphic site of the polymorphism;

[0020] (b) permitting the template-dependent extension of the primermolecule, and the incorporation of the single dideoxynucleotide; and

[0021] (c) determining the identity of the nucleotide incorporated intothe polymorphic site, the identified nucleotide being complimentary tothe nucleotide of the polymorphic site.

[0022] The invention further concerns the embodiment of the abovemethods wherein the template-dependent extension of the primer isconducted in the presence of at least two dideoxynucleotide triphosphatederivatives selected from the group consisting of ddATP, ddTTP, ddCTPand ddGTP, but in the absence of dATP, dTTP, dCTP and dGTP.

[0023] The invention particularly concerns the sub-embodiments of theabove methods wherein the nucleic acid of the sample is amplified invitro prior to the incubation, and/or the primer is immobilized to asolid support.

[0024] The invention further concerns the embodiment of the abovemethods wherein a non-invasive swab is used to collect the sample ofDNA.

[0025] The invention further provides a method for determining theprobability that a target horse will have a particular trait, whichcomprises the steps:

[0026] A) determining the identity of a single nucleotide present at apolymorphic site of an equine single nucleotide polymorphism, and beingpresent in more than 51% of a set of reference horses;

[0027] B) determining whether a single nucleotide present at apolymorphic site of a corresponding single nucleotide polymorphism ofthe target horse has the same identity as the single nucleotide presentat the polymorphic site of the 51% of reference horses exhibiting thetrait;

[0028] C) using the determination of step B to establish the probabilitythat the target horse will have the particular trait.

[0029] The invention further provides a method for creating a geneticmap of unique sequence equine polymorphisms which comprises the steps:

[0030] A) identifying at least one pair of inter-breeding referencehorses, wherein each of the pairs of horses is characterized by having afirst and a second reference horse,

[0031] the first reference horse having:

[0032] two alleles (i) and (ii), the alleles each being singlenucleotide polymorphic alleles having a single nucleotide polymorphicsite;

[0033] the second reference horse having:

[0034] a corresponding allele (i′) to the allele (i) of the firstreference horse, wherein the allele (i′) has a single nucleotidepolymorphic site, and wherein the single nucleotide present at thepolymorphic site of the allele (i′) differs from the single nucleotidepresent at the polymorphic site of the allele (i) of the first referencehorse, and

[0035] B) identifying in a progeny of at least one of the pairs ofinter-breeding reference horses the single nucleotide present at asingle nucleotide polymorphic site of a corresponding allele of thealleles (i) and (i′), and the single nucleotide present at a singlenucleotide polymorphic site of a corresponding allele of the alleles(ii) and (ii′); and

[0036] C) determining the extent of genetic linkage between the alleles(i) and (ii), to thereby create the genetic map.

[0037] The invention further provides a method for predicting whether atarget horse will exhibit a predetermined trait which comprises thesteps:

[0038] A) identifying one or more alleles associated with the trait,each allele being a single nucleotide polymorphic allele having a singlenucleotide polymorphic site;

[0039] B) determining for each of the single nucleotide polymorphicalleles, a nucleotide present at the allele's polymorphic site in areference horse exhibiting the trait, to thereby define a set of singlenucleotides at a set of polymorphic sites that are present in areference horse exhibiting the trait;

[0040] C) determining the identity of single nucleotides present atcorresponding single nucleotide polymorphic alleles of the target horse;and

[0041] D) comparing the identity of the single nucleotides present atthe polymorphic sites of the polymorphisms of the reference animal withthe single nucleotides present at the corresponding single nucleotidepolymorphic alleles of the target horse.

[0042] The invention further provides a method for identifying a singlenucleotide polymorphic site which comprises:

[0043] A) isolating a fragment of genomic DNA of a reference organism;

[0044] B) sequencing the fragment of DNA to thereby determine thenucleotide sequence of a segment of the fragment, the segment being of alength sufficient to define the nucleotide sequence of a pair ofoligonucleotide primers capable of mediating the specific amplificationof the fragment;

[0045] C) using the oligonucleotide primers to mediate the specificamplification of DNA obtained from a plurality of other organisms of thesame species as the reference organism; and

[0046] D) determining the nucleotide sequences of the amplified DNAmolecules of step C, and comparing the sequence of the amplifiedmolecules with the sequence of the fragment of the reference organism tothereby identify a single nucleotide polymorphic site.

[0047] The invention also includes a method for interrogating apolymorphic region of a human single nucleotide polymorphism of a targethuman, the method comprising:

[0048] A) selecting a known human single nucleotide polymorphism forinterrogation;

[0049] B) identifying the sequence of at least one oligonucleotide thatflanks the selected single nucleotide polymorphism; the identifiedsequence being of a length sufficient to permit the identification ofprimers capable of being used to effect the specific amplification ofthe flanking oligonucleotide and the polymorphism;

[0050] C) using the primers to effect the amplification of the flankingoligonucleotide and the polymorphism of the single nucleotidepolymorphism of the target human; and

[0051] D) interrogating the single nucleotide polymorphism of theamplified polymorphism by genetic bit analysis.

BRIEF DESCRIPTION OF THE FIGURES

[0052]FIG. 1 illustrates the preferred method for cloning random genomicfragments. Genomic DNA us size fractionated, and then introduced into aplasmid vector, in order to obtain random clones. PCR primers aredesigned, and used to sequence the inserted genomic sequences.

[0053]FIG. 2 illustrates the data generated by preferred method foridentifying new polymorphic sequences which is cycle sequencing of arandom genomic fragment.

[0054]FIG. 3 illustrates the RFLP method for screening random clones forpolymorphic sequences. After the initial optimization of PCR conditions(top panel), amplified material is cleaved with several restrictionenzymes, and the resulting profiles are analyzed (middle panels). Apopulation study is then performed to determine allelic frequencies.

[0055]FIG. 4 shows a graph of the probability that two individuals willhave identical genotypes with given panels of genetic markers. Thenumber of tests employed is plotted on the abscissa while the cumulativeprobability of non-identity is plotted on the ordinate. The horizontalline indicates 0.95 probability of non-identity. Legend: o indicates theextrapolated prototype; x indicates 3 alleles (51%, 34%, 15%); triangleindicates 2 alleles (79%, 21%).

[0056]FIG. 5 shows a graph of the probability that given panels of 20genetic markers will exclude a random alleged father in a paternity suitin which the mother is not in question. The number of tests employed isplotted on the abscissa while the cumulative probability of exclusion isplotted on the ordinate. The horizontal line indicates 0.95 probabilityof exclusion. The legend is as in FIG. 4.

[0057]FIG. 6 uses the SNP identified in clone 177-2 to illustrate theorganization of the sequences in Table 1.

[0058]FIG. 7 illustrates the preferred method for genotyping SNPs. Theseven steps illustrate how GBA can be performed starting with abiological sample.

[0059]FIGS. 8A and 8B illustrate how horse parentage data appears at themicrotiter plate level.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0060] I. The Single Nucleotide Polymorphisms of the Present Inventionand the Advantages of their Use in Genetic Analysis

[0061] A. The Attributes of the Polymorphisms

[0062] The particular gene sequences of interest to the presentinvention comprise “single nucleotide polymorphisms.” A “polymorphism”is a variation in the DNA sequence of some members of a species. Thegenomes of animals and plants naturally undergo spontaneous mutation inthe course of their continuing evolution (Gusella, J. F., Ann. Rev.Biochem. 55:831-854 (1986)). The majority of such mutations createpolymorphisms. The mutated sequence and the initial sequence co-exist inthe species' population. In some instances, such co-existence is instable or quasi-stable equilibrium. In other instances, the mutationconfers a survival or evolutionary advantage to the species, andaccordingly, it may eventually (i.e. over evolutionary time) beincorporated into the DNA of every member of that species.

[0063] A polymorphism is thus said to be “allelic,” in that, due to theexistence of the polymorphism, some members of a species may have theunmutated sequence (i.e. the original “allele”) whereas other membersmay have a mutated sequence (i.e. the variant or mutant “allele”). Inthe simplest case, only one mutated sequence may exist, and thepolymorphism is said to be diallelic. Diallelic polymorphisms are themost common and the preferred polymorphisms of the present invention.The occurrence of alternative mutations can give rise to trialleleic,etc. polymorphisms. An allele may be referred to by the nucleotide(s)that comprise the mutation. Thus, for example, in Table 1, clone 177-2(SEQ ID NO: 1 and SEQ ID NO: 2) illustrates the sequence of one strandof a diallelic polymorphism in which one allele has a “C” and the otherallele has a “T” at the polymorphic site.

[0064] The present invention is directed to a particular class ofallelic polymorphisms, and to their use in genotyping a plant or animal.Such allelic polymorphisms are referred to herein as “single nucleotidepolymorphisms,” or “SNPs.” “Single nucleotide polymorphisms” are definedby the following attributes. A central attribute of such a polymorphismis that it contains a polymorphic site, “X,” most preferably occupied bya single nucleotide, which is the site of variation between allelicsequences. A second characteristic of an SNP is that its polymorphicsite “X” is preferably preceded by and followed by “invariant” sequencesof the allele. The polymorphic site of the SNP is thus said to lie“immediately” 3′ to a “5′-proximal” invariant sequence, and“immediately” 5′ to a “3′-distal” invariant sequence. Such sequencesflank the polymorphic site.

[0065] As used herein, a sequence is said to be an “invariant” sequenceof an allele if the sequence does not vary in the population of thespecies, and if mapped, would map to a “corresponding” sequence of thesame allele in the genome of every member of the species population. Twosequences are said to be “corresponding” sequences if they are analogsof one another obtained from different sources. The gene sequences thatencode hemoglobin in two humans illustrate “corresponding” allelicsequences. The definition of “corresponding alleles” provided herein isintended to clarify, but not to alter, the meaning of that term asunderstood by those of ordinary skill in the art. Each row of Table 1shows the identity of the nucleotide of the polymorphic site of“corresponding” equine alleles, as well as the invariant 5′-proximal and3′-distal sequences that are also attributes of that SNP. “Correspondingalleles” are illustrated in Table 5 with regard to human alleles. Eachrow of Table 5 shows the identity of the nucleotide of the polymorphicsite of “corresponding” human alleles, as well as the invariant5′-proximal and 3′-distal sequences that are also attributes of thatSNP.

[0066] Since genomic DNA is double-stranded, each SNP can be defined interms of either strand. Thus, for every SNP, one strand will contain animmediately 5′-proximal invariant sequence and the other will contain animmediately 3′-distal invariant sequence. In the preferred embodiment,wherein a SNP's polymorphic site, “X,” is a single nucleotide, eachstrand of the double-stranded DNA of the SNP will contain both animmediately 5′-proximal invariant sequence and an immediately 3′-distalinvariant sequence.

[0067] Although the preferred SNPs of the present invention involve asubstitution of one nucleotide for another at the SNP's polymorphicsite, SNPs can also be more complex, and may comprise a deletion of anucleotide from, or an insertion of a nucleotide into, one of twocorresponding sequences. For example, a particular gene sequence maycontain an A in a particular polymorphic site in some animals, whereasin other animals a single or multiple base deletion might be present atthat site. Although the preferred SNPs of the present invention haveboth an invariant proximal sequence and invariant distal sequence, SNPsmay have only an invariant proximal or only an invariant distalsequence.

[0068] Nucleic acid molecules having the a sequence complementary tothat of an immediately 3′-distal invariant sequence of a SNP can, ifextended in a “template-dependent” manner, form an extension productthat would contain the SNP's polymorphic site. An preferred example ofsuch a nucleic acid molecule is a nucleic acid molecule whose sequenceis the same as that of a 5′-proximal invariant sequence of the SNP.“Template-dependent” extension refers to the capacity of a polymerase tomediate the extension of a primer such that the extended sequence iscomplementary to the sequence of a nucleic acid template. A “primer” isa single-stranded oligonucleotide or a single-stranded polynucleotidethat is capable of being extended by the covalent addition of anucleotide in a “template-dependent” extension reaction. In order topossess such a capability, the primer must have a 3′-hydroxyl terminus,and be hybridized to a second nucleic acid molecule (i.e. the“template”). A primer is typically 11 bases or longer; most preferably,a primer is 20 bases, however, primers of shorter or greater length maysuffice. A “polymerase” is an enzyme that is capable of incorporatingnucleoside triphosphates to extend a 3′-hydroxyl group of a nucleic acidmolecule, if that molecule has hybridized to a suitable template nucleicacid molecule. Polymerase enzymes are discussed in Watson, J. D., In:Molecular Biology of the Gene, 3rd Ed., W. A. Benjamin, Inc., MenloPark, Calif. (1977), which reference is incorporated herein byreference, and similar texts. Other polymerases such as the largeproteolytic fragment of the DNA polymerase I of the bacterium E. coli,commonly known as “Klenow” polymerase, E. coli DNA polymerase I, andbacteriophage T7 DNA polymerase, may also be used to perform the methoddescribed herein. Nucleic acids having the same sequence as that of theimmediately 3′ distal invariant sequence of a SNP can be ligated in atemplate dependent fashion to a primer that has the same sequence asthat of the immediately 5′ proximal sequence that has been extended byone nucleotide in a template dependent fashion.

[0069] B. The Advantages of Using SNPs in Genetic Analysis

[0070] The single nucleotide polymorphic sites of the present inventioncan be used to analyze the DNA of any plant or animal. Such sites areparticularly suitable for analyzing the genome of mammals, includinghumans, non-human primates, domestic animals (such as dogs, cats, etc.),farm animals (such as cattle, sheep, etc.) and other economicallyimportant animals, in particular, horses. They may, however be used withregard to other types of animals, particularly birds (such as chickens,turkeys, etc.) SNPs have several salient advantages over RFLPs, STRs andVNTRs.

[0071] First, SNPs occur at greater frequency (approximately 10-100 foldgreater), and with greater uniformity than RFLPs and VNTRs. The greaterfrequency of SNPs means that they can be more readily identified thanthe other classes of polymorphisms. The greater uniformity of theirdistribution permits the identification of SNPs “nearer” to a particulartrait of interest. The combined effect of these two attributes makesSNPs extremely valuable. For example, if a particular trait (e.g.predisposition to cancer) reflects a mutation at a particular locus,then any polymorphism that is linked to the particular locus can be usedto predict the probability that an individual will be exhibiting thattrait.

[0072] The value of such a prediction is determined in part by thedistance between the polymorphism and the locus. Thus, if the locus islocated far from any repeated tandem nucleotide sequence motifs, VNTRanalysis will be of very limited value. Similarly, if the locus is farfrom any detectable RFLP, an RFLP analysis would not be accurate.However, since the SNPs of the present invention are presentapproximately once every 300 bases in the mammalian genome, and exhibituniformity of distribution, a SNP can, statistically, be found within150 bases of any particular genetic lesion or mutation. Indeed, theparticular mutation may itself be an SNP. Thus, where such locus hasbeen sequenced, the variation in that locus' nucleotide is determinativeof the trait in question.

[0073] Second, SNPs are more stable than other classes of polymorphisms.Their spontaneous mutation rate is approximately 10⁻⁹, approximately1,000 times less frequent than VNTRs. Significantly, VNTR-typepolymorphisms are characterized by high mutation rates.

[0074] Third, SNPs have the further advantage that their allelicfrequency can be inferred from the study of relatively fewrepresentative samples. These attributes of SNPs permit a much higherdegree of genetic resolution of identity, paternity exclusion, andanalysis of an animal's predisposition for a particular genetic traitthan is possible with either RFLP or VNTR polymorphisms.

[0075] Fourth, SNPs reflect the highest possible definition of geneticinformation—nucleotide position and base identity. Despite providingsuch a high degree of definition, SNPs can be detected more readily thaneither RFLPs or VNTRs, and with greater flexibility. Indeed, because DNAis double-stranded, the complimentary strand of the allele can beanalyzed to confirm the presence and identity of any SNP.

[0076] The flexibility with which an identified SNP can be characterizedis a salient feature of SNPs. VNTR-type polymorphisms, for example, aremost easily detected through size fractionation methods that can discerna variation in the number of the repeats. RFLPs are most easily detectedby size fractionation methods following restriction digestion.

[0077] In contrast, SNPs can be characterized using any of a variety ofmethods. Such methods include the direct or indirect sequencing of thesite, the use of restriction enzymes where the respective alleles of thesite create or destroy a restriction site, the use of allele-specifichybridization probes, the use of antibodies that are specific for theproteins encoded by the different alleles of the polymorphism, or byother biochemical interpretation.

[0078] The “Genetic Bit Analysis (“GBA”) method disclosed by Goelet, P.et al. (WO 92/15712, herein incorporated by reference), and discussedbelow, is a preferred method for detecting the single nucleotidepolymorphisms of the present invention. GBA is a method of polymorphicsite interrogation in which the nucleotide sequence informationsurrounding the site of variation in a target DNA sequence is used todesign an oligonucleotide primer that is complementary to the regionimmediately adjacent to, but not including, the variable nucleotide inthe target DNA. The target DNA template is selected from the biologicalsample and hybridized to the interrogating primer. This primer isextended by a single labeled dideoxynucleotide using DNA polymerase inthe presence of two, and preferably all four chain terminatingnucleoside triphosphate precursors. Cohen, D. et al. (PCT ApplicationWO91/02087) describes a related method of genotyping.

[0079] Recently, several primer-guided nucleotide incorporationprocedures for assaying polymorphic sites in DNA have been described(Komher, J. S. et al., Nucl. Acids. Res. 17:7779-7784 (1989); Sokolov,B. P., Nucl. Acids Res. 18:3671 (1990); Syvänen, A. -C., et al.,Genomics 8:684-692 (1990); Kuppuswamy, M. N. et al., Proc. Natl. Acad.Sci. (U.S.A.) 88:1143-1147 (1991); Prezant, T. R. et al., Hum. Mutat.1:159-164 (1992); Ugozzoli, L. et al., GATA 9:107-112 (1992); Nyrén, P.et al., Anal. Biochem. 208:171-175 (1993)). These methods differ fromGBA in that they all rely on the incorporation of labeleddeoxynucleotides to discriminate between bases at a polymorphic site. Insuch a format, since the signal is proportional to the number ofdeoxynucleotides incorporated, polymorphisms that occur in runs of thesame nucleotide can result in signals that are proportional to thelength of the run (Syvänen, A. -C., et al., Amer. J. Hum. Genet.52:46-59 (1993)). Such a range of locus-specific signals could be morecomplex to interpret, especially for heterozygotes, compared to thesimple, ternary (2:0, 1:1, or 0:2) class of signals produced by the GBAmethod. In addition, for some loci, incorporation of an incorrectdeoxynucleotide can occur even in the presence of the correctdideoxynucleotide (Komher, J. S. et al., Nucl. Acids. Res. 17:7779-7784(1989)). Such deoxynucleotide misincorporation events may be due to theKm of the DNA polymerase for the mispaired deoxy-substrate beingcomparable, in some sequence contexts, to the relatively poor Km of evena correctly base paired dideoxy-substrate (Kornberg, A., et al., In: DNAReplication, 2nd Edition, W. H. Freeman and Co., (1992); New York;Tabor, S. et al., Proc. Natl. Acad. Sci. (U.S.A.) 86:4076-4080 (1989)).This effect would contribute to the background noise in the polymorphicsite interrogation.

[0080] II. Methods for Discovering Novel Polymorphic Sites

[0081] A preferred method for discovering polymorphic sites involvescomparative sequencing of genomic DNA fragments from a number of haploidgenomes. In the preferred embodiment, illustrated in FIG. 1, suchsequencing is performed by preparing a random genomic library thatcontains 0.5-3 kb fragments of DNA derived from one member of a species.Sequences of these recombinants are then used to facilitate PCRsequencing of a number of randomly selected individuals of that speciesat the same genomic loci.

[0082] From such genomic libraries (typically of approximately 50,000clones), several hundred (200-500) individual clones are purified, andthe sequences of the termini of their inserts are determined. Only asmall amount of terminal sequence data (100-200 bases) need be obtainedto permit PCR amplification of the cloned region. The purpose of thesequencing is to obtain enough sequence information to permit thesynthesis of primers suitable for mediating the amplification of theequivalent fragments from genomic DNA samples of other members of thespecies. Preferably, such sequence determinations are performed usingcycle sequencing methodology.

[0083] The primers are used to amplify DNA from a panel of randomlyselected members of the target species. The number of members in thepanel determines the lowest frequency of the polymorphisms that are tobe isolated. Thus, if six members are evaluated, a polymorphism thatexists at a frequency of, for example, 0.01 might not be identified. Inan illustrative, but oversimplified, mathematical treatment, a samplingof six members would be expected to identify only those polymorphismsthat occur at a frequency of greater than about 0.08 (i.e. 1.0 totalfrequency divided by 6 members divided by 2 alleles per genome).

[0084] Thus, if one desires the identification of less frequentpolymorphisms, a greater number of panel members must be evaluated.

[0085] Cycle sequence analysis (Mullis, K. et al., Cold Spring HarborSymp. Quant. Biol. 51:263-273 (1986); Erlich H. et al., European PatentAppln. 50,424; European Patent Appln. 84,796, European PatentApplication 258,017, European Patent Appln. 237,362;

[0086] Mullis, K., European Patent Appln. 201,184; Mullis K. et al.,U.S. Pat. No. 4,683,202; Erlich, H., U.S. Pat. No. 4,582,788; and Saiki,R. et al., U.S. Pat. No. 4,683,194)) is facilitated through the use ofautomated DNA sequencing instruments and software (Applied Biosystems,Inc.). Differences between sequences of different animals can thereby beidentified and confirmed by inspecting the relevant portion of thechromatograms on the computer screen. Differences are interpreted toreflect a DNA polymorphism only if the data was available for bothstrands, and present in more than one haploid example among thepopulation of animals tested. FIG. 2 illustrates the preferred methodfor identifying new polymorphic sequences which is cycle sequencing of arandom genomic fragment. The PCR fragments from five unrelated horseswere electroeluted from acrylamide gels and sequenced using repetitivecycles of thermostable Taq DNA polymerase in the presence of a mixtureof dNTPs and fluorescent ddNTPs. The products were then separated andanalyzed using an automated DNA sequencing instrument of AppliedBiosystems, Inc. The data was analyzed using ABI software. Differencesbetween sequences of different animals were identified by the softwareand confirmed by inspecting the relevant portion of the chromatograms onthe computer screen. Differences are presented as “DNA Polymorphisms”only if the data is available for both strands and present in more thanone haploid example among the five horses tested. The top panel shows an“A” homozygote, the middle panel an “AT” heterozygote and the bottompanel a “T” homozygote.

[0087] Despite the randomized nature of such a search for polymorphisms,such sequencing and comparison of random DNA clones is readily able toidentify suitable polymorphisms. Indeed, with respect to the horse,approximately {fraction (1/400)} nucleotides sequenced by these methodswould be discovered as the polymorphic site of an SNP.

[0088] The discovery of polymorphic sites can alternatively be conductedusing the strategy outlined in FIG. 3. In this embodiment, the DNAsequence polymorphisms are identified by comparing the restrictionendonuclease cleavage profiles generated by a panel of severalrestriction enzymes on products of the PCR reaction from the genomictemplates of unrelated members. Most preferably, each of the restrictionendonucleases used will have four base recognition sequences, and willtherefore allow a desirable number of cuts in the amplified products.

[0089] The restriction digestion patterns obtained from the genomic DNAsare preferably compared directly to the patterns obtained from PCRproducts generated using the corresponding plasmid templates. Such acomparison provides an internal control which indicates that theamplified sequences from the genomic and plasmid DNAs derive fromequivalent loci. This control also allows identification of primers thatfortuitously amplify repeated sequences, or multicopy loci, since thesewill generate many more fragments from the genomic DNA templates thanfrom the plasmid templates.

[0090] III. Methods for Genotyping the Single Nucleotide Polymorphismsof the Present Invention

[0091] Any of a variety of methods can be used to identify thepolymorphic site, “X,” of a single nucleotide polymorphism of thepresent invention. The preferred method of such identification involvesdirectly ascertaining the sequence of the polymorphic site for eachpolymorphism being analyzed. This approach is thus markedly differentfrom the RFLP method which analyzes patterns of bands rather than thespecific sequence of a polymorphism.

[0092] A. Sampling Methods

[0093] Nucleic acid specimens may be obtained from an individual of thespecies that is to be analyzed using either “invasive” or “non-invasive”sampling means. A sampling means is said to be “invasive” if it involvesthe collection of nucleic acids from within the skin or organs of ananimal (including, especially, a murine, a human, an ovine, an equine, abovine, a porcine, a canine, or a feline animal). Examples of invasivemethods include blood collection, semen collection, needle biopsy,pleural aspiration, etc. Examples of such methods are discussed by Kim,C. H. et al. (J. Virol. 66:3879-3882 (1992)); Biswas, B. et al. (AnnalsNY Acad. Sci. 590:582-583 (1990)); Biswas, B. et al. (J. Clin.Microbiol. 29:2228-2233 (1991)).

[0094] In contrast, a “non-invasive” sampling means is one in which thenucleic acid molecules are recovered from an internal or externalsurface of the animal. Examples of such “non-invasive” sampling meansinclude “swabbing,” collection of tears, saliva, urine, fecal material,sweat or perspiration, etc. As used herein, “swabbing” denotescontacting an applicator/collector (“swab”) containing or comprising anadsorbent material to a surface in a manner sufficient to collectsurface debris and/or dead or sloughed off cells or cellular debris.Such collection may be accomplished by swabbing nasal, oral, rectal,vaginal or aural orifices, by contacting the skin or tear ducts, bycollecting hair follicles, etc.

[0095] Nasal swabs have been used to obtain clinical specimens for PCRamplification (Olive, D. M. et al., J. Gen. Virol. 71:2141-2147 (1990);Wheeler, J. G. et al., Amer. J. Vet. Res. 52:1799-1803 (1991)). The useof hair follicles to identify VNTR polymorphisms for paternity testingin horses has been described by Ellegren, H. et al. (Animal Genetics23:133-142 (1992). The reference states that a standardized testingsystem based on PCR-analyzed microsatellite polymorphisms are likely tobe an alternative to blood typing for paternity testing.

[0096] A preferred swab for the collection of DNA will comprise a solidsupport, at least a portion of which is designed to adsorb DNA. Theportion designed to adsorb DNA may be of a compressible texture, such asa “foam rubber,” or the like. Alternatively, it may be an adsorptivefibrous composition, such as cotton, polyester, nylon, or the like. Inyet another embodiment, the portion designed to adsorb DNA may be anabrasive material, such as a bristle or brush, or having a roughsurface. The portion of the swab that is designed to adsorb DNA may be acombination of the above textures and compositions (such as acompressible brush, etc.). The swab will, preferably, be speciallyformed in a substantially rod-like, arrow-like or mushroom-like shape,such that it will have a segment that can be held by the collectingindividual, and a tip or end portion which can be placed into contactwith the surface that contains the sample DNA that is to be collected.In one embodiment, the swab will be provided with a storage chamber,such as a plastic or glass tube or cylinder, which may have one openend, such as a test-tube. Alternatively, the tube may have two openends, such that after swabbing, the collector can pull on one end of theswab so as to cause the other end of the swab to be withdrawn into thetube. In yet another embodiment, the tube may have two open ends, suchthat after swabbing, the tube can be converted into a column to assistin the further processing of the collected DNA. In one embodiment, theend or ends of the storage chamber are self-sealing after swabbing hasbeen accomplished.

[0097] The swab or the storage chamber may contain antimicrobial agentsat concentrations sufficient to prevent the proliferation of microbes(bacteria, yeast, molds, etc.) during subsequent storage or handling.

[0098] In one embodiment, the swab or storage chamber will contain anchromogenic reagent which reacts to the presence of DNA to yield adetectable signal that can be identified at the time of samplecollection. Most preferably, such a reagent will comprise a minimumconcentration “open-end point” assay for DNA. Such an assay is capableof detecting concentrations of nucleic acids that range from the minimumdetection level of the assay to the maximum assay saturation level ofthe assay. This saturation level is adjustable, and can be increased bydecreasing the time of reaction. Preferred chromogenic reagents includeanti-DNA antibodies that are conjugated to enzymes, diaminopimelic acid,etc.

[0099] B. Amplification-Based Analysis

[0100] The detection of polymorphic sites in a sample of DNA may befacilitated through the use of DNA amplification methods. Such methodsspecifically increase the concentration of sequences that span thepolymorphic site, or include that site and sequences located eitherdistal or proximal to it. Such amplified molecules can be readilydetected by gel electrophoresis or other means.

[0101] The most preferred method of achieving such amplification employsPCR, using primer pairs that are capable of hybridizing to the proximalsequences that define a polymorphism in its double-stranded form.

[0102] In lieu of PCR, alternative methods, such as the “Ligase ChainReaction” (“LCR”) may be used (Barany, F., Proc. Natl. Acad. Sci.(U.S.A.) 88:189-193 (1991). LCR uses two pairs of oligonucleotide probesto exponentially amplify a specific target. The sequences of each pairof oligonucleotides is selected to permit the pair to hybridize toabutting sequences of the same strand of the target. Such hybridizationforms a substrate for a template-dependent ligase. As with PCR, theresulting products thus serve as a template in subsequent cycles and anexponential amplification of the desired sequence is obtained.

[0103] In accordance with the present invention, LCR can be performedwith oligonucleotides having the proximal and distal sequences of thesame strand of a polymorphic site. In one embodiment, eitheroligonucleotide will be designed to include the actual polymorphic siteof the polymorphism. In such an embodiment, the reaction conditions areselected such that the oligonucleotides can be ligated together only ifthe target molecule either contains or lacks the specific nucleotidethat is complementary to the polymorphic site present on theoligonucleotide.

[0104] In an alternative embodiment, the oligonucleotides will notinclude the polymorphic site, such that when they hybridize to thetarget molecule, a “gap” is created (see, Segev, D., PCT Application WO90/01069). This gap is then “filed” with complementary dNTPs (asmediated by DNA polymerase), or by an additional pair ofoligonucleotides. Thus, at the end of each cycle, each single strand hasa complement capable of serving as a target during the next cycle andexponential amplification of the desired sequence is obtained.

[0105] The “Oligonucleotide Ligation Assay” (“OLA”) (Landegren, U. etal., Science 241:1077-1080 (1988)) shares certain similarities with LCRand may also be adapted for use in polymorphic analysis.

[0106] The OLA protocol uses two oligonucleotides which are designed tobe capable of hybridizing to abutting sequences of a single strand of atarget. OLA, like LCR, is particularly suited for the detection of pointmutations. Unlike LCR, however, OLA results in “linear” rather thanexponential amplification of the target sequence.

[0107] Nickerson, D. A. et al. have described a nucleic acid detectionassay that combines attributes of PCR and OLA (Nickerson, D. A. et al.,Proc. Natl. Acad. Sci. (U.S.A.) 87:8923-8927 (1990). In this method, PCRis used to achieve the exponential amplification of target DNA, which isthen detected using OLA. In addition to requiring multiple, andseparate, processing steps, one problem associated with suchcombinations is that they inherit all of the problems associated withPCR and OLA.

[0108] Schemes based on ligation of two (or more) oligonucleotides inthe presence of nucleic acid having the sequence of the resulting“di-oligonucleotide”, thereby amplifying the di-oligonucleotide, arealso known (Wu, D. Y. et al., Genomics 4:560 (1989)), and may be readilyadapted to the purposes of the present invention.

[0109] Other known nucleic acid amplification procedures, such astranscription-based amplification systems (Malek, L. T. et al., U.S.Pat. No. 5,130,238; Davey, C. et al., European Patent Application329,822; Schuster et al., U.S. Pat. No. 5,169,766; Miller, H. I. et al.,PCT appln. WO 89/06700; Kwoh, D. et al., Proc. Natl. Acad. Sci. (U.S.A.)86:1173 (1989); Gingeras, T. R. et al., PCT application WO 88/10315)),or isothermal amplification methods (Walker, G. T. et al., Proc. Natl.Acad. Sci. (U.S.A.) 89:392-396 (1992)) may also be used.

[0110] C. Preparation of Single-Stranded DNA

[0111] The direct analysis of the sequence of an SNP of the presentinvention can be accomplished using either the “dideoxy-mediated chaintermination method,” also known as the “Sanger Method” (Sanger, F., etal., J. Molec. Biol. 94:441 (1975)) or the “chemical degradationmethod,” “also known as the “Maxam-Gilbert method” (Maxam, A.M., et al.,Proc. Natl. Acad. Sci. (U.S.A.) 74:560 (1977), both references hereinincorporated by reference). Methods for sequencing DNA using either thedideoxy-mediated method or the Maxam-Gilbert method are widely known tothose of ordinary skill in the art. Such methods are, for example,disclosed in Sambrook, J. et a., Molecular Cloning, a Laboratory Manual,2nd Edition. Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1989),and in Zyskind, J. W., et al., Recombinant DNA Laboratory Manual,Academic Press. Inc., New York (1988), both herein incorporated byreference.

[0112] Where a nucleic acid sample contains double-stranded DNA (orRNA), or where a double-stranded nucleic acid amplification protocol(such as PCR) has been employed, it is generally desirable to conductsuch sequence analysis after treating the double-stranded molecules soas to obtain a preparation that is enriched for, and preferablypredominantly, only one of the two strands.

[0113] The simplest method for generating single-stranded DNA moleculesfrom double-stranded DNA is denaturation using heat or alkalaitreatment.

[0114] Single-stranded DNA molecules may also be produced using thesingle-stranded DNA bacteriophage M13 (Messing, J. et al., Meth.Enzymol. 101:20 (1983); see also, Sambrook, J., et al. (In: MolecularCloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y. (1989)).

[0115] Several alternative methods can be used to generatesingle-stranded DNA molecules. Gyllensten, U. et al., (Proc. Natl. Acad.Sci. (U.S.A.) 85:7652-7656 (1988) and Mihovilovic, M. et al.,(BioTechniques 7(1):14 (1989)) describe a method, termed “asymmetricPCR,” in which the standard “PCR” method is conducted using primers thatare present in different molar concentrations.

[0116] Higuchi, R. G. et al. (Nucleic Acids Res. 17:5865 (1985))exemplifies an additional method for generating single-strandedamplification products. The method entails phosphorylating the5′-terminus of one strand of a double-stranded amplification product,and then permitting a 5′→3′ exonuclease (such as exonuclease) topreferentially degrade the phosphorylated strand.

[0117] Other methods have also exploited the nuclease resistantproperties of phosphorothioate derivatives in order to generatesingle-stranded DNA molecules (Benkovic et al., U.S. Pat. No. 4,521,509;Jun. 4, 1985); Sayers, J. R. et al. (Nucl. Acids Res. 16:791-802 (1988);Eckstein, F. et al., Biochemistry 15:1685-1691 (1976); Ott, J. et al.,Biochemistry 26:8237-8241 (1987)).

[0118] A discussion of the relative advantages and disadvantages of suchmethods of producing single-stranded molecules is provided by Nikiforov,T. (U.S. patent application Ser. No. 08/005,061, herein incorporated byreference).

[0119] Most preferably, such single-stranded molecules will be producedusing the methods described by Nikiforov, T. (U.S. patent applicationSer. No. 08/005,061, herein incorporated by reference). In brief, thesemethods employ nuclease resistant nucleotides derivatives, andincorporates such derivatives, by chemical synthesis or enzymatic means,into primer molecules, or their extension products, in place ofnaturally occurring nucleotides.

[0120] Suitable nucleotide derivatives include derivatives in which oneor two of the non-bridging oxygens of the phosphate moiety of anucleotide has been replaced with a sulfur-containing group (especiallya phosphorothioate), an alkyl group (especially a methyl or ethyl alkylgroup), a nitrogen-containing group (especially an amine), and/or aselenium-containing group, etc.

[0121] Phosphorothioate deoxyribonucleotide or ribonucleotidederivatives (e.g. a nucleoside 5′-O-1-thiotriphosphate) are the mostpreferred nucleotide, derivatives. Any of a variety of chemical methodsmay be used to produce such phosphorothioate derivatives (see, forexample, Zon, G. et al., Anti-Canc. Drug Des. 6:539-568 (1991); Kim, S.G. et al., Biochem. Biophys. Res. Commun. 179:1614-1619 (1991); Vu, H.et al., Tetrahedron Lett. 32:3005-3008 (1991); Taylor, J. W. et al.,Nucl. Acids Res. 13:8749-8764 (1985); Eckstein, F. et al., Biochemistry15:1685-1691 (1976); Ott, J. et al., Biochemistry 26:8237-8241 (1987);Ludwig, J. et al., J. Ora. Chem. 54:631-635 (1989), all hereinincorporated by reference). Phosphorothioate nucleotide derivatives canalso be obtained commercially from Amersham or Pharmacia.

[0122] Importantly, the selected nucleotide derivative must be suitablefor in vitro primer-mediated extension and provide nuclease resistanceto the region of the nucleic acid molecule in which it is incorporated.In the most preferred embodiment, it must confer resistance toexonucleases that attack double-stranded DNA from the 5′-end (5′→3′exonucleases). Examples of such exonucleases include bacteriophage T7gene 6 exonuclease (“T7 exonuclease) and the bacteriophage lambdaexonuclease (“λ exonuclease”). Both T7 exonuclease and λ exonuclease areinhibited to a significant degree by the presence of phosphorothioatebonds so as to allow the selective degradation of one of the strands.However, any double-strand specific, 5′→3′ exonuclease can be used forthis process, provided that its activity is affected by the presence ofthe bonds of the nuclease resistant nucleotide derivatives. Thepreferred enzyme when using phosphorothioate derivatives is the T7 gene6 exonuclease, which shows maximal enzymatic activity in the same bufferused for many DNA dependent polymerase buffers including Taq polymerase.The 5′→3′ exonuclease resistant properties of phosphorothioatederivative-containing DNA molecules are discussed, for example, inKunkel, T. A. (In: Nucleic Acids and Molecular Biology, Vol. 2, 124-135(Eckstein, F. et al., eds.), Springer-Verlag, Berlin, (1988)). The 3′→5′exonuclease resistant properties of phosphorothioate nucleotidecontaining nucleic acid molecules are disclosed in Putney, S. D., et al.(Proc. Natl. Acad. Sci. (U.S.A.) 78:7350-7354 (1981)) and Gupta, A. P.,et al. (Nucl. Acids. Res., 12:5897-5911 (1984)).

[0123] In addition to being resistant to such exonucleases, nucleic acidmolecules that contain phosphorothioate derivatives at restrictionendonuclease cleavage recognition sites are resistant to such cleavage.Taylor, J. W., et al. (Nucl. Acids Res., 13:8749-8764 (1985)) discussesthe endonuclease resistant properties of phosphorothioate nucleotidecontaining nucleic acid molecules.

[0124] The nuclease resistance of phosphorothioate bonds has beenutilized in a DNA amplification protocol (Walker, T. G. et al. (Proc.Natl. Acad. Sci. (U.S.A.) 89:392-396 (1992)). In the Walker et al.method, phosphorothioate nucleotide derivatives are installed within arestriction endonuclease recognition site in one strand of adouble-stranded DNA molecule. The presence of the phosphorothioatenucleotide derivatives protects that strand from cleavage, and thusresults in the nicking of the unprotected strand by the restrictionendonuclease. Amplification is accomplished by cycling the nicking andpolymerization of the strands.

[0125] Similarly, this resistance to nuclease attack has been used asthe basis for a modified “Sanger” sequencing method (Labeit, S. et al.(DNA 5:173-177 (1986)). In the Labeit et al. method, ³⁵S-labeledphosphorothioate nucleotide derivatives were employed in lieu of thedideoxy nucleotides of the “Sanger” method.

[0126] In the most preferred embodiment, the phosphorothioate derivativeis included in the primer. The nucleotide derivative may be incorporatedinto any position of the primer, but will preferably be incorporated atthe 5′-terminus of the primer, most preferably adjacent to one another.Preferably, the primer molecules will be approximately 25 nucleotides inlength, and contain from about 4% to about 100%, and more preferablyfrom about 4% to about 40%, and most preferably about 16%,phosphorothioate residues (as compared to total residues). Thenucleotides may be incorporated into any position of the primer, and maybe adjacent to one another, or interspersed across all or part of theprimer.

[0127] In one embodiment, the present invention can be used in concertwith an amplification protocol, for example, PCR. In this embodiment, itis preferred to limit the number of phosphorothioate bonds of theprimers to about 10 (or approximately half of the length of theprimers), so that the primers can be used in a PCR reaction without anychanges to the PCR protocol that has been established for non-modifiedprimers. When the primers contain more phosphorothioate bonds, the PCRconditions may require adjustment, especially of the annealingtemperature, in order to optimize the reaction.

[0128] The incorporation of such nucleotide derivatives into DNA or RNAcan be accomplished enzymatically, using a DNA polymerase (Vosberg, H.P. et al., Biochemistry 16: 3633-3640 (1977); Burgers, P. M. J. et al.,J. Biol. Chem. 254:6889-6893 (1979); Kunkel, T. A., In: Nucleic Acidsand Molecular Biology, Vol. 2, 124-135 (Eckstein, F. et al., eds.),Springer-Verlag, Berlin, (1988); Olsen, D. B. et al., Proc. Natl. Acad.Sci. (U.S.A.) 87:1451-1455 (1990); Griep, M. A. et al., Biochemistry29:9006-9014 (1990); Sayers, J. R. et al., Nucl. Acids Res. 16:791-802(1988)). Alternatively, phosphorothioate nucleotide derivatives can beincorporated synthetically into an oligonucleotide (Zon, G. et al.,Anti-Canc. Drug Des. 6:539-568 (1991)).

[0129] The primer molecules are permitted to hybridize to acomplementary target nucleic acid molecule, and are then extended,preferably via a polymerase, to form an extension product. The presenceof the phosphorothioate nucleotides in the primers renders the extensionproduct resistant to nuclease attack. As indicated, the amplificationproducts containing phosphorothioate or other suitable nucleotidederivatives are substantially resistant to “elimination” (i.e.degradation) by “5′→3′” exonucleases such as T7 exonuclease orexonuclease, and thus a 5′→3′ exonuclease will be substantiallyincapable of further degrading a nucleic acid molecule once it hasencountered a phosphorothioate residue.

[0130] Since the target molecule lacks nuclease resistant residues, theincubation of the extension product and its template—the target—in thepresence of a 5′→3′ exonuclease results in the destruction of thetemplate strand, and thereby achieves the preferential production of thedesired single strand.

[0131] D. Solid Phase Attachment of DNA

[0132] The preferred method of determining the identity of thepolymorphic site of a polymorphism involves nucleic acid hybridization.Although such hybridization can be performed in solution (Berk, A. J.,et al. Cell 12:721-732 (1977); Hood, L. E., et al., In: MolecularBiology of Eukaryotic Cells: A Problems Approach, Menlo Park, Calif.:Benjamin-Cummings, (1975); Wetmer, J. G., Hybridization and RenaturationKinetics of Nucleic Acids. Ann. Rev. Biophys. Bioeng. 5:337-361 (1976);Itakura, K., et al., Ann. Rev. Biochem. 53:323-356, (1984)), it ispreferable to employ a solid-phase hybridization assay (see, Saiki, R.K. et al. Proc. Natl. Acad. Sci. (U.S.A.) 86:6230-6234 (1989); Gilham etal., J. Amer. Chem. Soc. 86:4982 (1964) and Kremsky et al., Nucl. AcidsRes. 15:3131-3139 (1987)).

[0133] Any of a variety of methods can be used to immobilizeoligonucleotides to the solid support. One of the most widely usedmethods to achieve such an immobilization of oligonucleotide primers forsubsequent use in hybridization-based assays consists of thenon-covalent coating of these solid phases with streptavidin or avidinand the subsequent immobilization of biotinylated oligonucleotides(Holmstrom, K. et al., Anal. Biochem. 209:278-283 (1993)). Another knownmethod (Running. J. A. et al., BioTechniques 8:276-277 (1990); Newton,C. R. et al. Nucl. Acids Res. 21:1155-1162 (1993)) requires thepre-coating of the polystyrene or glass solid phases with poly-L-Lys orpoly L-Lys, Phe, followed by the covalent attachment of either amino- orsulfhydryl-modified oligonucleotides using bifunctional crosslinkingreagents. Both methods have the disadvantage of requiring the use ofmodified oligonucleotides as well as a pre-treatment of the solid phase.

[0134] In another published method (Kawai, S et al., Anal. Biochem.209:63-69 (1993)), short oligonucleotide probes were ligated together toform multimers and these were ligated into a phagemid vector. Followingin vitro amplification and isolation of the single-stranded form ofthese phagemids, they were immobilized onto polystyrene plates and fixedby UV irradiation at 254 nm. The probes immobilized in this way werethen used to capture and detect a biotinylated PCR product.

[0135] A method for the direct covalent attachment of short,5′-phosphorylated primers to chemically modified polystyrene plates(“Covalink” plates, Nunc) has also been published (Rasmussen, S. R. etal., Anal. Biochem. 198:138-142 (1991)). The covalent bond between themodified oligonucleotide and the solid phase surface is introduced bycondensation with a water-soluble carbodiimide. This method is claimedto assure a predominantly 5′-attachment of the oligonucleotides viatheir 5′-phosphates; however, it requires the use of specially prepared,expensive plates.

[0136] Most preferably, such immobilization of oligonucleotides(preferably between 15 and 30 bases) is accomplished using a method thatcan be used directly, without the need for any pre-treatment ofcommercially available polystyrene microwell plates (ELISA plates) ormicroscope glass slides. Since 96 well polystyrene plates are widelyused in ELISA tests, there has been significant interest in thedevelopment of methods for the immobilization of short oligonucleotideprimers to the wells of these plates for subsequent hybridizationassays. Also of interest is a method for the immobilization tomicroscope glass slides, since the latter are used in the so-calledSlide Immunoenzymatic Assay (SIA) (de Macario, E. C. et al.,BioTechniques 3:138-145 (1985)).

[0137] The solid support can be glass, plastic, paper, etc. The supportcan be fashioned as a bead, dipstick, test tube, etc. In a preferredembodiment, the support will be a microtiter dish, having a multiplicityof wells. The conventional 96-well microtiter dishes used in diagnosticlaboratories and in tissue culture are a preferred support. The use ofsuch a support allows the simultaneous determination of a large numberof samples and controls, and thus facilitates the analysis. Automateddelivery systems can be used to provide reagents to such microtiterdishes. Similarly, spectrophotometric methods can be used to analyze thepolymorphic sites, and such analysis can be conducted using automatedspectrophotometers.

[0138] One aspect of the present invention concerns a method forimmobilizing oligonucleotides for such analysis. In accordance with themethod, any of a number of commercially available polystyrene plates canbe used directly for the immobilization, provided that they have ahydrophilic surface. Examples of suitable plates include the Immulon 4plates (Dynatech) and the Maxisorp plates (Nunc). The immobilization ofthe oligonucleotides to the plates is achieved simply by incubation inthe presence of a suitable salt. No immobilization takes place in theabsence of a salt, i.e., when the oligonucleotide is present in a watersolution. Examples for suitable salts are: 50-250 mM NaCl; 30-100 mM1-ethyl-3-(3′-dimethylaminopropyl)carbodiimide hydrochloride (EDC), pH6.8; 50-150 mM octyldimethylamine hydrochloride, pH 7.0; 50-250 mMtetramethylammonium chloride. The immobilization is achieved byincubation, preferably at room temperature or 3 to 24 hours. After suchincubation, the plates are washed, preferably with a solution of 10 mMTris HCl, pH 7.5, containing 150 mM NaCl and 0.05% vol. Tween-20 (TNTw).The latter ingredient serves the important role of blocking all freeoligonucleotide binding sites still present on the polystyrene surface,so that no nonspecific binding of oligonucleotides can take place duringthe subsequent hybridization steps. Using radioactively labeledoligonucleotides, the amount of immobilized oligonucleotides per wellwas determined to be at least 500 fmoles. The oligonucleotides areimmobilized to the surface of the plate with sufficient stability andcan only be removed by prolonged incubations with 0.5 M NaOH solutionsat elevated temperatures. No oligonucleotide is removed by washing theplate with water, TNTw (Tween 20), PBS, 1.5 M NaCl, or other similarsolutions.

[0139] The immobilized oligonucleotides can be used to capture specificDNA sequences by hybridization. The hybridization is usually carried outin a solution containing 1.5 M NaCl and 10 mM EDTA, for 15 to 30 minutesat room temperature. Other hybridization conditions can also be used.More than 400 fmoles of a specific DNA sequence was found to hybridizeto the immobilized oligonucleotide in one well. This DNA is bound to theinitially immobilized oligonucleotide only via Watson-Crick hydrogenbonds can be easily removed from the wells by a brief wash with a 0.1 MNaOH solution, without removing the initially attached oligonucleotidefrom the plate. If the captured DNA fragment is nonradioactivelylabeled, e.g., with a biotin residue, the detection can be carried outusing a suitable enzyme-linked assay.

[0140] Although no modifications have to be introduced into thesynthetic oligonucleotides, the method also allows for theimmobilization of labeled (e.g., biotinylated) oligonucleotides, ifdesired. The amount of oligonucleotide that can be immobilized in asingle well of an ELISA plate by this method is at least 500 fmoles. Theoligonucleotides thus immobilized onto the solid phase can hybridize tosuitable templates and also participate in enzymatic reactions liketemplate-directed extensions and ligations.

[0141] For high volume testing applications, it is desirable to usenon-radioactive detection methods. Thus, the use of haptenateddideoxynucleotides is preferred; the use of biotinylateddideoxynucleotides is particularly preferred as such modification wouldrender the incorporated base detectable by the standard avidin (orstreptavidin) enzyme conjugates used in ELISA assays. The biotinylatedddNTPs are preferably prepared by reacting the four respective(3-aminopropyn-1-yl)nucleoside triphosphates with sulfosuccinimidyl6-(biotinamido)hexanoate. Thus, (3-aminopropyn-1-yl) nucleoside5′-triphosphates are prepared as described by Hobbs, F. W. (J. Org.Chem. 54:3420-3422 (1989)) and by Hobbs, F. W. et al. (U.S. Pat. No.5,047,519). The (3-aminopropyn-1-yl)nucleoside 5′-triphosphate (50 mol)is dissolved in 1 ml of pH 7.6, 1 M aqueous triethylammonium bicarbonate(TEAB). Sulfosuccinimidyl 6-(biotinamido) hexanoate sodium salt (Pierce,55.7 mg, 100 mol) is added and the solution is heated to 50° C. in astoppered tube for 2 hr. The reaction mixture is diluted to 10 ml withwater and applied to a DEAE-Sephadex A-25-120 column (1.6×19 cm). Thecolumn is eluted with a linear gradient of pH 7.6 aqueous TEAB (0.1 M to1.0 M) and the eluent monitored at 270 nm. The late-eluting major peakis collected, stripped, and co-evaporated with ethanol. The crudeproduct, containing biotinylated nucleoside triphosphate and, in somecases, contaminating starting material, is further purified by reversephase column chromatography (Baker C-18 packing, 2×12 cm bed). Thematerial is loaded in 0.1 M pH 7.6 TEAB and eluted with a step gradientof acetonitrile in 0.1 M pH 7.6 TEAB (O % to 36%, 2% increments, 8ml/step). In all cases, the biotinylated product is more stronglyretained and cleanly resolved from the starting material.Product-containing fractions are pooled, stripped, and co-evaporatedwith ethanol. The product is taken up in water and the yield calculatedusing the absorption coefficient for the starting nucleotide. The ³H NMRand ³¹P NMR spectra are consistent with the expected structure andconfirm the absence of phosphorus containing or nucleotide-derivedimpurities. The materials are observed to be >99% pure by HPLC (WatersBondapak C-18, 4.6×250 mm, 1 ml/min, 1 to 35% CH₃CN/pH 7/0.01 Mtriethylammonium acetate).

[0142] The synthesis of5-(3-(6-biotinamido(hexanoamido)propyn-1-yl)-2′,3′-dideoxyuridine-5′-triphosphatehas an approximate yield of 25% (assuming =12,400 at 291.5 nm); HPLCt_(X)=16.1 min.

[0143] The synthesis of5-(3-(6-biotinamido(hexanoamido)propyn-1-yl)-2′,3′-dideoxycytidine-5′-triphosphatehas an approximate yield of 63% (assuming =9,230 at 294.5 nm); HPLCt_(X)=19.4 min.

[0144] The synthesis of7-(3-(6-biotinamido(hexanoamido)propyn-1-yl)-7-deaza-2′,3′-dideoxyadenosine-5′-triphosphatehas an approximate yield of 39% (assuming =13,600 at 278.5 nm); HPLCt_(X)=23.1 min.

[0145] The synthesis of7-(3-(6-biotinamido(hexanoamido)propyn-1-yl)-7-deaza-2′,3′-dideoxyguanosine-5′-triphosphatehas an approximate yield of 44% (assuming =9,300 at 291 nm); HPLCt_(X)=21.2 min.

[0146] E. Solid Phase Analysis of Polymorphic Sites

[0147] 1. Polymerase-Mediated Analysis

[0148] Although the identity of the nucleotide(s) of the polymorphicsites of the present invention can be determined in a variety of ways,an especially preferred method exploits the oligonucleotide-baseddiagnostic assay of nucleic acid sequence variation disclosed by Goelet,P. et al. (PCT Application WO92/15712, herein incorporated byreference). In this assay, a purified oligonucleotide having a definedsequence (complementary to an immediate proximal or distal sequence of apolymorphism) is bound to a solid support, especially a microtiter dish.A sample, suspected to contain the target molecule, or an amplificationproduct thereof, is placed in contact with the support, and any targetmolecules present are permitted to hybridize to the boundoligonucleotide.

[0149] In one preferred embodiment, an oligonucleotide having a sequencethat is complementary to an immediately distal sequence of apolymorphism is prepared using the above-described methods (andpreferably that of Nikiforov, T. (U.S. patent application Ser. No.08/005,061). The terminus of the oligonucleotide is attached to thesolid support, as described, for example by Goelet, P. et al. (PCTApplication WO 92/15712), such that the 3′-end of the oligonucleotidecan serve as a substrate for primer extension.

[0150] The immobilized primer is then incubated in the presence of a DNAmolecule (preferably a genomic DNA molecule) having a single nucleotidepolymorphism whose immediately 3′-distal sequence is complementary tothat of the immobilized primer. Preferably, such incubation occurs inthe complete absence of any dNTP (i.e. dATP, dCTP, dGTP, or dTTP), butonly in the presence of one or more chain terminating nucleotidetriphosphate derivatives (such as a dideoxy derivative), and underconditions sufficient to permit the incorporation of such a derivativeon to the 3′-terminus of the primer. As will be appreciated, where thepolymorphic site is such that only two or three alleles exist (such thatonly two or three species of dNTPs, respectively, could be incorporatedinto the primer extension product), the presence of unusable nucleotidetriphosphate(s) in the reaction is immaterial. In consequence of theincubation, and the use of only chain terminating nucleotidederivatives, a single dideoxynucleotide is added to the 3′-terminus ofthe primer. The identity of that added nucleotide is determined by; andis complementary to, the nucleotide of the polymorphic site of thepolymorphism.

[0151] In this embodiment, the nucleotide of the polymorphic site isthus determined by assaying which of the set of labeled nucleotides hasbeen incorporated onto the 3′-terminus of the bound oligonucleotide by aprimer-dependent polymerase. Most preferably, where multipledideoxynucleotide derivatives are simultaneously employed, differentlabels will be used to permit the differential determination of theidentity of the incorporated dideoxynucleotide derivative.

[0152] 2. Polymerase/Ligase-Mediated Analysis

[0153] In an alternative embodiment, the identity of the nucleotide ofthe polymorphic site is determined using a polymerase/ligase-mediatedprocess. As in the above embodiment, an oligonucleotide primer isemployed, that is complementary to the immediately 3′-distal invariantsequence of the SNP. A second oligonucleotide, is tethered to the solidphase via its 3′-end. The sequence of this oligonucleotide iscomplementary to the 5′-proximal sequence of the polymorphism beinganalyzed, but is incapable of hybridizing to the oligonucleotide primer.

[0154] These oligonucleotides are incubated in the presence of DNAcontaining the single nucleotide polymorphism that is to be analyzed,and at least one 2′,5′-deoxynucleotide triphosphate. The incubationreaction further includes a DNA polymerase and a DNA ligase. Thus, forexample, where the polymorphism of clone 177-2 (Table 1) is beingevaluated, and the tethered oligonucleotide could comprise the 3′-distalsequence of SEQ ID NO: 2, the second oligonucleotide would have the5′-proximal sequence of SEQ ID NO: 1.

[0155] The tethered and soluble oligonucleotides are thus capable ofhybridizing to the same strand of the single nucleotide polymorphismunder analysis. The sequence considerations cause the twooligonucleotides to hybridize to the proximal and distal sequences ofthe SNP that flank the polymorphic site (X) of the polymorphism; thehybridized oligonucleotides are thus separated by a “gap” of a singlenucleotide at the precise position of the polymorphic site.

[0156] The presence of a polymerase and a 2′,5′-deoxynucleotidetriphosphate complementary to (X) permits ligation of the primerextended with the complementary 2′,5′-deoxynucleotide triphosphate tothe immobilized oligo complementary to the distal sequence, a2′,5′-deoxynucleotide triphosphate that is complementary to thenucleotide of the polymorphic site permits the creation of a ligatablesubstrate. The ligation reaction immobilizes the 2′,5′-deoxynucleotideand the previously soluble primer oligonucleotide to the solid support.

[0157] The identity of the polymorphic site that was opposite the “gap”can then be determined by any of several means. In a preferredembodiment, the 2′,5′-deoxynucleotide triphosphate of the reaction islabeled, and its detection thus reveals the identity of thecomplementary nucleotide of the polymorphic site. Several different2′,5′-deoxynucleotide triphosphates may be present, each differentiallylabeled. Alternatively, separate reactions can be conducted, each with adifferent 2′,5′-deoxynucleotide triphosphate. In an alternativesub-embodiment, the 2′,5′-deoxynucleotide triphosphates are unlabeled,and the second, soluble oligonucleotide is labeled. Separate reactionsare conducted, each using a different unlabeled 2′,5′-deoxynucleotidetriphosphate. The reaction that contains the complementary nucleotidepermits the ligatable substrate to form, and is detected by detectingthe immobilization of the previously soluble oligonucleotide.

[0158] F. Signal-Amplification

[0159] The sensitivity of nucleic acid hybridization detection assaysmay be increased by altering the manner in which detection is reportedor signaled to the observer. Thus, for example, assay sensitivity can beincreased through the use of detectably labeled reagents. A wide varietyof such signal amplification methods have been designed for thispurpose. Kourilsky et al. (U.S. Pat. No. 4,581,333) describe the use ofenzyme labels to increase sensitivity in a detection assay. Fluorescentlabels (Albarella et al., EP 144914), chemical labels (Sheldon III etal., U.S. Pat. No. 4,582,789; Albarella et al., U.S. Pat. No.4,563,417), modified bases (Miyoshi et al., EP 119448), etc. have alsobeen used in an effort to improve the efficiency with whichhybridization can be observed.

[0160] It is preferable to employ fluorescent, and more preferablychromogenic (especially enzyme) labels, such that the identity of theincorporated nucleotide can be determined in an automated, orsemi-automated manner using a spectrophotometer.

[0161] IV. The Use of SNP Genotyping in Methods of Genetic Analysis

[0162] A. General Considerations for Using Single NucleotidePolymorphisms in Genetic Analysis

[0163] The utility of the polymorphic sites of the present inventionstems from the ability to use such sites to predict the statisticalprobability that two individuals will have the same alleles for anygiven polymorphisms.

[0164] Statistical analysis of SNPs can be used for any of a variety ofpurposes. Where a particular animal has been previously tested, suchtesting can be used as a “fingerprint” with which to determine if acertain animal is, or is not that particular animal.

[0165] Where a putative parent or both parents of an individual havebeen tested, the methods of the present invention may be used todetermine the likelihood that a particular animal is or is not theprogeny of such parent or parents. Thus, the detection and analysis ofSNVs can be used to exclude paternity of a male for a particularindividual (such as a stallion's paternity of a particular foal), or toassess the probability that a particular individual is the progeny of aselected female (such as a particular foal and a selected mare).

[0166] As indicated below, the present invention permits theconstruction of a genetic map of a target species. Thus, the particulararray of polymorphisms identified by the methods of the presentinvention can be correlated with a particular trait, in order to predictthe predisposition of a particular animal (or plant) to such geneticdisease, condition, or trait. As used herein, the term “trait” isintended to encompass “genetic disease,” “condition,” or“characteristics.” The term, “genetic disease” denotes a pathologicalstate caused by a mutation, regardless of whether that state can bedetected or is asymptomatic. A “condition” denotes a predisposition to acharacteristic (such as asthma, weak bones, blindness, ulcers, cancers,heart or cardiovascular illnesses, skeleto-muscular defects, etc.). A“characteristic” is an attribute that imparts economic value to a plantor animal. Examples of characteristics include longevity, speed,endurance, rate of aging, fertility, etc.

[0167] B. Identification and Parentage Verification

[0168] The most useful measurements for determining the power of anidentification and paternity testing system are: (i) the “probability ofidentity” (p(ID)) and (ii) the “probability of exclusion” (p(exc)). Thep(ID) calculates the likelihood that two random individuals will havethe same genotype with respect to a given polymorphic marker. The p(exc)calculates the likelihood, with respect to a given polymorphic marker,that a random male will have a genotype incompatible with him being thefather in an average paternity case in which the identity of the motheris not in question. Since single genetic loci, including loci withnumerous alleles such as the major histocompatibility region, rarelyprovide tests with adequate statistical confidence for paternitytesting, a desirable test will preferably measure multiple unlinked lociin parallel. Cumulative probabilities of identity or non-identity, andcumulative probabilities of paternity exclusion are determined for thesemulti-locus tests by multiplying the probabilities provided by eachlocus.

[0169] The statistical measurements of greatest interest are: (i) thecumulative probability of non-identity (cum p(nonID)), and (ii) thecumulative probability of paternity exclusion (cum p(exc)).

[0170] The formulas used for calculating these probability values aregiven below. For simplicity these are given first for 2-allele loci,where one allele is termed type A and the other type B. In such a model,four genotypes are possible: AA, AB, BA, and BB (types AB and BA beingindistinguishable biochemically). The allelic frequency is given by thenumber of times A (f(A), the frequency of A is denoted by “p”) or B(f(B), the frequency of B is denoted by “q,” where q=1−p) is found inthe haploid genome. The probability of a given genotype at a givenlocus:

Homozygote: p(AA)=p ²

Single Heterozygote: p(AB)=p(BA)=pq=p(1−p)

Both Heterozygotes: p(AB+BA)=2pq=2p(1−p)

Homozygote: p(BB)=q ²−(1−p)²

[0171] The probability of identity at one locus (i.e the probabilitythat two individuals, picked at random from a population will haveidentical genotypes at a given locus) is given by the equation:

p(ID)=(p ²)²+(2pq)²+(q ²)²

[0172] The cumulative probability of identity for n loci is thereforegiven by the equation:

cum p(ID)=⊂ p(ID ₁)p(ID ₂)p(ID ₃) . . . p(ID _(n))

[0173] The cumulative probability of non-identity for n loci (i.e. theprobability that two individuals will be different at 1 or more loci) isgiven by the equation:

cum p(nonID)=1−cum p(!D)

[0174] The probability of parentage exclusion (representing theprobability that a random male will have a genotype, with respect to agiven locus, that makes him incompatible as the sire in an averagepaternity case where the identity of the mother is not in question) isgiven by the equation:

p(exc)=pq(1−pq)

[0175] The probability of non-exclusion (representing the probability ata given locus that a random male will not be biochemically excluded asthe sire in an average paternity case) is given by the equation:

p(non−exc)=1−p(exc)

[0176] The cumulative probability of non-exclusion (representing thevalue obtained when n loci are used) is thus:

cum p(non−exc)−⊂ p(non−exc ₂)p(non−exc ₂)p(non−exc ₃) . . . p(non−exc_(n))

[0177] The cumulative probability of exclusion (representing theprobability, using a panel of n loci, that a random male will bebiochemically excluded as the sire in an average paternity case wherethe mother is not in question) is given by the equation:

cum p(exc)=1−cum p(non−exc)

[0178] These calculations may be extended for any number of alleles at agiven locus. For example, the probability of identity p(ID) for a3-allele system where the alleles have the frequencies in the populationof p, q and r, respectively, is equal to the sum of the squares of thegenotype frequencies:

p(ID)=p ⁴+(2pq)²+(2qr)²+(2pr)² +r ⁴ +q ⁴

[0179] Similarly, the probability of exclusion for a three allele systemis given by:

p(exc)=pq(1−pq)+qr(1−qr)+pr(1−pr)+3pqr(1−pqr)

[0180] In a locus of n alleles, the appropriate binomial expansion isused to calculate p(ID) and p(exc).

[0181]FIGS. 4 and 5 show how the cum p(nonID) and the cum p(exc)increase with both the number and type of genetic loci used. It can beseen that greater discriminatory power is achieved with fewer markerswhen using three allele systems. In FIGS. 4 and 5, the triangles tracethe increase in probability values with increasing numbers of loci withtwo alleles where the common allele is present at a frequency of p=0.79.The crosses in FIGS. 4 and 5 show the same analysis for increasingnumbers of three-allele loci where p=0.51, q=0.34 and r=0.15.

[0182] The choice between whether to use loci with 2, 3 or more allelesis however largely influenced by the above-described biochemicalconsiderations. A polymorphic analysis test may be designed to score forany number of alleles at a given locus. If allelic scoring is to beperformed using gel electrophoresis, each allele should be easilyresolvable by gel electrophoresis. Since the length variations inmultiple allelic families are often small, human DNA tests usingmultiple allelic families include statistical corrections for mistakenidentification of alleles. Furthermore, although the appearance of arare allele from a multiple allelic system may be highly informative,the rarity of these alleles makes accurate measurements of theirfrequency in the population extremely difficult. To correct for errorsin these frequency estimates when using rare alleles, the statisticalanalysis of this data must include a measure of the cumulative effectsof uncertainty in these frequency estimates. The use of these multipleallelic systems also increases the likelihood that new or rare allelesin the population will be discovered during the course of largepopulation screening. The integrity of previously collected genetic datawould be empirically revised to reflect the discovery of a new allele.

[0183] In view of these considerations, although the use of loci withmany alleles could potentially offer some short-term advantages (becausefewer loci would need to be screened), it is preferable to performpolymorphic analyses using loci with fewer alleles that are: (i) morefrequently represented, and (ii) easier to measure unambiguously. Testsof this type can achieve the same power of discrimination as tests basedon more highly polymorphic loci, provided the same total number ofalleles is collected from a series of unlinked loci.

[0184] C. Gene Mapping and Genetic Trait Analysis Using SNPs

[0185] The polymorphisms detected in a set of individuals of the samespecies (such as humans, horses, etc.), or of closely related species,can be analyzed to determine whether the presence or absence of aparticular polymorphism correlates with a particular trait.

[0186] To perform such polymorphic analysis, the presence or absence ofa set of polymorphisms (i.e. a “polymorphic array”) is determined for aset of the individuals, some of which exhibit a particular trait, andsome of which exhibit a mutually exclusive characteristic (for example,with respect to horses, brittle bones vs. non-brittle bones; maturityonset blindness vs. no blindness; predisposition to asthma,cardiovascular disease vs. no such predisposition). The alleles of eachpolymorphism of the set are then reviewed to determine whether thepresence or absence of a particular allele is associated with theparticular trait of interest.

[0187] Any such correlation defines a genetic map of the individual'sspecies. Alleles that do not segregate randomly with respect to a traitcan be used to predict the probability that a particular animal willexpress that characteristic. For example, if a particular polymorphicallele is present in only 20% of the members of a species that exhibit acardiovascular condition, then a particular member of that speciescontaining that allele would have a 20% probability of exhibiting such acardiovascular condition. As indicated, the predictive power of theanalysis is increased by the extent of linkage between a particularpolymorphic allele and a particular characteristic. Similarly, thepredictive power of the analysis can be increased by simultaneouslyanalyzing the alleles of multiple polymorphic loci and a particulartrait. In the above example, if a second polymorphic allele was found toalso be present in 20% of members exhibiting the cardiovascularcondition, however, all of the evaluated members that exhibited such acardiovascular condition had a particular combination of alleles forthese first and second polymorphisms, then a particular membercontaining both such alleles would have a very high probability ofexhibiting the cardiovascular condition.

[0188] The detection of multiple polymorphic sites permits one to definethe frequency with which such sites independently segregate in apopulation. If, for example, two polymorphic sites segregate randomly,then they are either on separate chromosomes, or are distant to oneanother on the same chromosome. Conversely, two polymorphic sites thatare co-inherited at significant frequency are linked to one another onthe same chromosome. An analysis of the frequency of segregation thuspermits the establishment of a genetic map of markers. Thus, the presentinvention provides a means for mapping the genomes of plants andanimals.

[0189] The resolution of a genetic map is proportional to the number ofmarkers that it contains. Since the methods of the present invention canbe used to isolate a large number of polymorphic sites, they can be usedto create a map having any desired degree of resolution.

[0190] The sequencing of the polymorphic sites greatly increases theirutility in gene mapping. Such sequences can be used to designoligonucleotide primers and probes that can be employed to “walk” downthe chromosome and thereby identify new marker sites (Bender, W. et al.,J. Supra. Molec. Struc. 10(suppl.):32 (1979); Chinault, A. C. et al.,Gene 5:111-126 (1979); Clarke, L. et al., Nature 287:504-509 (1980)).

[0191] The resolution of the map can be further increased by combiningpolymorphic analyses with data on the phenotype of other attributes ofthe plant or animal whose genome is being mapped. Thus, if a particularpolymorphism segregates with brown hair color, then that polymorphismmaps to a locus near the gene or genes that are responsible for haircolor. Similarly, biochemical data can be used to increase theresolution of the genetic map. In this embodiment, a biochemicaldetermination (such as a serotype, isoform, etc.) is studied in order todetermine whether it co-segregates with any polymorphic site. Such mapscan be used to identify new gene sequences, to identify the causalmutations of disease, for example.

[0192] Indeed, the identification of the SNPs of the present inventionpermits one to use complimentary oligonucleotides as primers in PCR orother reactions to isolate and sequence novel gene sequences located oneither side of the SNP. The invention includes such novel genesequences. The genomic sequences that can be clonally isolated throughthe use of such primers can be transcribed into RNA, and expressed asprotein. The present invention also includes such protein, as well asantibodies and other binding molecules capable of binding to suchprotein.

[0193] The invention is illustrated below with respect to two of itsembodiments—horses and humans. However, because the fundamental tenetsof genetics apply irrespective of species, such illustration is equallyapplicable to any other species. Those of ordinary skill would thereforeneed only to directly employ the methods of the above invention toisolate SNPs in any other species, and to thereby conduct the geneticanalysis of the present invention.

[0194] As indicated above, LOD scoring methodology has been developed topermit the use of RFLPs to both track the inheritance of genetic traits,and to construct a genetic map of a species (Lander, S. et al., Proc.Natl. Acad. Sci. (U.S.A.) 83:7353-7357 (1986); Lander, S. et al., Proc.Natl. Acad. Sci. (U.S.A.) 84:2363-2367 (1987); Donis-Keller, H. et al.,Cell 51:319-337 (1987); Lander, S. et al., Genetics 121:185-199 (1989)).Such methods can be readily adapted to permit their use with thepolymorphisms of the present invention. Indeed, such polymorphisms aresuperior to RFLPs and STRs in this regard. Due to the frequency of SNPs,it is possible to readily generate a dense genetic map. Moreover, asindicated above, the polymorphisms of the present invention are morestable than typical (VNTR-type) RFLP polymorphisms,

[0195] The polymorphisms of the present invention comprise directgenomic sequence information and can therefore be typed by a number ofmethods. In an RFLP or STR-dependent map, the analysis must begel-based, and entail obtaining an electrophoretic profile of the DNA ofthe target animal. In contrast, an analysis of the polymorphisms (SNPs)of the present invention may be performed using spectrophotometricmethods, and can readily be automated to facilitate the analysis oflarge numbers of target animals.

[0196] Having now generally described the invention, the same will bemore readily understood through reference to the following examples ofthe isolation and analysis of equine polymorphisms which are provided byway of illustration, and are not intended to be limiting of the presentinvention.

EXAMPLE 1

[0197] Discovery of Equine Polymorphisms

[0198] As an initial step in the identification of equine polymorphisms,small shotgun libraries were prepared from genomic DNA isolated fromperipheral blood leukocytes which had been purified on a Ficoll-hypaquedensity gradient from the blood of a single, 15 year old thoroughbredgelding (John Henry). This DNA was simultaneously digested to completionwith Bam HI and Pst I and either used directly or after sizefractionation on agarose gels.

[0199] Vector pLT14 (a variant of the Stratagene piasmid pKSM13(−)) wasdigested with Bam HI and Pst I and linearized DNA was purified from anagarose gel. For both vector and size-fractionated genomic DNA, agaroseplugs were solubilized in saturated sodium iodide and the DNA wassubsequently immobilized on glass powder. After washing, the DNA waseluted with water and ethanol precipitated with glycogen carrier.

[0200] Ligations with varying vector/insert ratios were effectuated withT4 DNA ligase at 4° C. E. coli strain XLI was transformed with ligationmixtures and plated on LB agar containing 100 g/ml ampicillin.Approximately 50,000 clones were generated in several differentexperiments using size fractionated or unfractionated insert DNA.Unplated transformed cells were stored at −70° C. in 7% DMSO. Colonieswere streaked for isolation and small scale plasmid preparations wereperformed to determine the size of inserted equine DNA. Larger scalepreparations were performed with Qiagen chromatography.

[0201] The sequence of the first 200-300 nucleotides of the genomicinsert was determined by the chain terminating dideoxynucleoside methodwith T7 DNA polymerase from primers complementary to plasmid sequences.This information was used to design synthetic oligonucleotide primerscomplementary to the equine sequence to be employed in PCR reactions.

[0202] In most cases, two sets of PCR primers (generally 25 -mers) weresynthesized. The first set was used to amplify, under a standardized setof conditions, from genomic DNA. The products of these reactions werediluted and used as template DNA in a second PCR using nested primersslightly internal to the original set. The products of these tworeactions were compared to those obtained using the original plasmid DNAas template. In most cases, it was possible to obtain high quality,single-species products using this procedure with no attempt to optimizereaction conditions for any particular pair of primers.

[0203] Two different methods were used to screen amplified DNA fromhorses for polymorphic sequences. Initially, PCR fragments from a panelof 6 horses were digested with a panel of restriction endonucleaseshaving 4 base recognition sites. The products of these reactions wereanalyzed by acrylamide gel electrophoresis on 5% -7.5% non-denaturinggels. Digestion products which showed variability when hybridized todifferent members of the panel were subjected to DNA sequence analysis.Later, DNA sequencing was used directly to screen for polymorphic sites.The PCR fragments from five unrelated horses were electroeluted fromacrylamide gels and sequenced using repetitive cycles of thermostableTaq polymerase reaction in the presence of a mixture of dNTPs andfluorescent ddNTPs. The products were then separated and analyzed usingthe automated DNA sequencing instrument of Applied Biosystems, Inc. Thedata was analyzed using ABI software. Differences between sequences ofdifferent animals were identified by the software and confirmed byinspecting the relevant portion of the chromatograms on the computerscreen. Differences were concluded to be a DNA polymorphism only if thedata was available for both strands, and/or present in more than onehaploid example among the five horses tested.

EXAMPLE 2

[0204] Characterization of Equine Polymorphisms

[0205] The program of identification and characterization of polymorphicDNA sequences in randomly selected fragments was continued such thatapproximately 550 plasmids have been characterized to this level. Thesequences adjacent to the cloning sites was determined for 200 of theseplasmids. Inserts of these sequenced plasmids ranged in size from 0.25to 3.5 kb. Using this sequence information, oligonucleotide primers weredesigned to enable PCR amplification of the same genomic region fromdifferent horses.

[0206] In order to identify the nucleotides present at polymorphicsites, PCR fragments from 5 horses were purified from acrylamide gels byelectroelution and completely sequenced using Taq polymerase “Cycle”sequencing biochemistry and automated sequencing equipment. Results fromthe 5 horses were analyzed by computer and visually confirmed. DNAsequence variants discovered by this method were scored only if thesequence was obtained on both strands and the variant sequence had beenfound in more than one haploid example. The 18 clones of Table 1comprise a subset of identified SNPs. In Table 1, the immediately5′-proximal sequence, the identity of the nucleotide of the polymorphicsite, and the immediately 3′-distal sequence of each SNP is presented.For each SNP, Such sequences are shown in the horizontal rows. Thesequences of double-stranded DNA in Table 1 is presented in compliancewith the Sequence Listing requirements of the United States Patent andTrademark Office. Thus, all sequences are presented in the sameorientation (5′→3′). The organization of the Table is illustrated inFIG. 6 with respect to an illustrative SNP, clone 177-2. This SNP has apolymorphic site capable of having either a C or a T in one strand, anda G or A in the opposite strand. The 5′-proximal DNA sequence thatimmediately precedes the polymorphic site in the C/T strand isdesignated as SEQ ID NO: 1. The 3′-distal sequence that immediatelyfollows the polymorphic site in the CIT strand is designated as SEQ IDNO: 2. The 5′-proximal DNA sequence that immediately precedes thepolymorphic site in the G/A strand is designated as SEQ ID NO: 3. The3′-distal sequence that immediately follows the polymorphic site in theG/A strand is designated as SEQ ID NO: 4. Bearing in mind that thesequences are written in the same orientation (5′→3′), it will be seenthat the sequences of SEQ ID NO: 1 and SEQ ID NO: 4 are complimentary;similarly, the sequences of SEQ ID NO: 2 and SEQ ID NO: 3 arecomplimentary. The sequences that flank a particular polymorphic siteare thus obtained by combining the proximal sequence of one row with thedistal sequence also shown in the same row. TABLE 1 POLYMORPHIC LOCIIDENTIFIED SNP SEQ ID ALLELE SEQ ID CLONE NO. 5′ PROXIMAL SEQUENCE 1 23′ DISTAL SEQUENCE NO. 177-2 1 GCAGCTCTAAGTGCTGTGGG C TTGCAGAAATTCTAAGGTGTT 2 3 AACACCTTAGAATTTCTGCA G A CCCACAGCACTTAGAGCTGC 4595-3 5 AGCTCTGGGATGATCCACTA A G TGAGGGAAAAATGATGATGC 6 7GCATCATCATTTTTCCCTCA T C TAGTGGATCATCCCAGAGCT 8 090-2 9AAAACTAATTTGATGGCCAT G A AAAGTCAGAACAATGATTGC 10 11 GCAATCATTGTTCTGACTTTC T ATGGCCATCAAATTAGTTTT 12 324-1 13 CACAAGGCCCAAGAACAGGA T CTGAGTTCAGCGAGTGTCAGA 14 15 TCTGACACTCGCTGAACTCA A G TCCTGTTCTTGGGCCTTGTG16 129-1 17 TGGGAAAGACCACATTATTT T A GTTCCCTTTTGTTTCAGACC 18 19GGTCTGAAACAAAAGGGAAC A T AAATAATGTGGTCTTTCCCA 20 007-1 21CATGAGTAAGAAGCATCCGG G C CCATGGAGTCATAGATAAGT 22 23 ACTTATCTATGACTCCATGGC G CCGGATGCTTCTTACTCATG 24 324-2 25 CCCAAGAACAGGATTGAGTT C TAGCGAGTGTCAGAGTTGTGT 26 27 ACACAACTCTGACACTCGCT G A AACTCAATCCTGTTCTTGGG28 177-3 29 AGCAAGAAATGGGGGGCCTT A G GTCCTACAATTGCCAGGAAG 30 31CTTCCTGGCAATTGTAGGAC T C AAGGCCCCCCATTTCTTGCT 32 595-1 33GAATATCAATATATATATAT G A TGTGTGTGTGTGTATTTGCT 34 35 AGCAAATACACACACACACAC T ATATATATATATTGATATTC 36 007-3 37 GCCATAATTAAGCCTGTATT A GGTTTGTTTTAAATTTTGTGA 38 39 TCACAAAATTTAAAACAAAC T C AATACAGGCTTAATTATGGC40 459-1 41 GTGTAGAGTAGTTCAAGGAC A C ATGTCTTATACCTCCCTTTT 42 43AAAAGGGAGGTATAAGACAT T G GTCCTTGAACTACTCTACAC 44 085-1 45GTGAACGGAGAGCAGGCCTT C G CCTGCTGAAGCCTCAGACCG 46 47 CGGTCTGAGGCTTCAGCAGGG C AAGGCCTGCTCTCCGTTCAC 48 007-2 49 CTGCTCTTTAGACTATGACC G ATCAACCTTGCATCATGAGCT 50 51 AGCTCATGATGCAAGGTTGA C T GGTCATAGTCTAAAGAGCAG52 474-1 53 TTTGAGCTGGGACCTCAGTC T A TCTCCTGCCTTTAGACTCGA 54 55TCGAGTCTAAAGGCAGGAGA A T GACTGAGGTCCCAGCTCAAA 56 178-1 57GAACCTCTGGGCCGTGGATA A G TTGTTCAGAAGCACAGGTGA 58 59 TCACCTGTGCTTCTGAACAAT C TATCCACGGCCCAGAGGTTC 60 595-2 61 GTATTTGCTAGCTCTGGGAT T GATCCACTAATGAGGGAAAAA 62 63 TTTTTCCCTCATTAGTGGAT A C ATCCCAGAGCTAGCAAATAC64 177-1 65 GAAGTTGTGGGACAGATGTG C A AGAGATGCAGCTCTAAGTGC 66 67GCACTTAGAGCTGCATCTCT G T CACATCTGTCCCACAACTTC 68 459-2 69CCATGAGGAAGCCTCCACAA C G GTCCCAATAGTCTGGGATTC 70 71 GAATCCCAGACTATTGGGACG C TTGTGGAGGCTTCCTCATGG 72

[0207] The present specification refers to the above sequences by theirsequence ID numbers (i.e. SEQ ID NO). To facilitate such disclosure,algebraic notation (such as “2n+1”) is employed, in accordance withconventional algebra. Thus, the designation “SEQ ID NO: (2n+1)” denotesSEQ ID NO: 5 where n=2, and SEQ ID NO: 7 where n=3, etc.

EXAMPLE 3

[0208] Allelic Frequency Analysis of Equine Polymorphisms in SmallPopulation Studies

[0209] Small population studies (50-60 animals) of these DNA sequencepolymorphisms has been carried out on a number of these polymorphicsites using Genetic Bit Analysis (GBA), the preferred solid-phase,single nucleotide interrogation system (Goelet, P. et al. (WO 92/15712).The 7 steps of the most preferred embodiment is illustrated in FIG. 7:

[0210] Step 1: DNA preparation.

[0211] Step 2: Amplification of Target Sequence. After DNA is preparedfrom the sample, a specific region of the sample genome (locus) isamplified using the PCR. One of the PCR primers is modified with fourphosphorothioate linkages at the 5′-end.

[0212] Step 3: Exonuclease Digestion and the Generation ofSingle-Stranded Template. The PCR product is digested with exonuclease,leaving the phosphorothioated strand intact.

[0213] Step 4: Hybridization to Capture the Amplified Template. Thetemplate strand is next hybridized to the appropriate GBA primer that isimmobilized on the surface of a microtiter well.

[0214] Step 5: Single Base Extension with Polymerase. DNA polymerase andhaptenated ddNTPs are used to extend the GBA primer by one base in atemplate-dependent manner.

[0215] Step 6: Colorimetric detection of the Extension Product. Afterthe template is washed away using NaOH, the haptenated base is detectedusing an anti-hapten conjugate and the appropriate colorimetricsubstrate.

[0216] Step 6: Computer-Assisted Interpretation of Genotype. Thecolorimetric data from a number of loci is converted to an SNP genotypefor the particular individual tested.

[0217] The method is preferably conducted in the following manner:

[0218] GBA Template Preparation.

[0219] Amplification of genomic sequences was performed using thepolymerase chain reaction (PCR). In a first step, one hundred nanogramsof genomic DNA was used in a reaction mixture containing each firstround primer at a concentration of 2 M and 10 mM Tris pH 8.3, 50 mM KCl,1.5 mM MgCl₂, 0.01% gelatin; and 0.05 units per I Taq DNA Polymerase(AmpliTaq, Perkin Elmer).

[0220] To obtain single-stranded template for use with solid-phaseimmobilized primer, either of two methods may be used. First, theamplification may be mediated using primers that contain 4posphorothioate-nucleotide derivatives, as taught by Nikiforov, T. (U.S.patent application Ser. No. 08/005,061). Alternatively, a second roundof PCR may be performed using “asymmetric” primer concentrations. Theproducts of the first reaction are diluted {fraction (1/1000)} in asecond reaction. One of the second round primers is used at the standardconcentration of 2 M while the other is used at 0.08 M. Under theseconditions, single stranded molecules are synthesized during thereaction.

[0221] Solid Phase Immobilization of Nucleic Acids.

[0222] For the GBA procedure, solid-phase attachment of thetemplate-primer complex simplifies washes, buffer exchanges, etc., andin principle this attachment can be either via the template or theprimer. In practice, however, especially when non gel-based detectionmethods are employed, attachment via the primer is preferable. Thisformat allows the use of stringent washes (e.g., 0.2 N NaOH) to removeimpurities and reaction side products while retaining the haptenateddideoxynucleotide covalently linked to the 3′-end of the primer.

[0223] Therefore, for GBA reactions in 96-well plates (Nunc Nunclonplates, Roskilde, Denmark), the GBA primer was covalently coupled to theplate. This was accomplished by incubating 10 pmoles of primer having a5′-amino group per well in 50 of 3 mM sodium phosphate buffer, pH 6, 20mM 1-ethyl-3-(3-dimethylaminopropyl)-carbodiimide (EDC) overnight atroom temperature. After coupling, the plate was washed three times withTNTw.

[0224] GBA in Microwell Plates.

[0225] Hybridization of single-stranded DNA to primers covalentlycoupled to 96-well plates was accomplished by adding an equal volume of3 M NaCl, 20 mM EDTA to the single-stranded PCR product and incubatingeach well with 20 l of this mixture at 20° C. for 30 minutes. The platewas subsequently washed three times with TNTw. Twenty l of polymeraseextension mix containing ddNTPs (3 M each, one of which wasbiotinylated, 5 mM DTT, 7.5 mM sodium isocitrate, 5 mM MnCl₂, 0.04 unitsper l of Klenow DNA polymerase and incubated for 5 minutes at roomtemperature.

[0226] Following the extension reaction, the plate was washed once withTNTw. Template strands were removed by incubating wells with 50 μl of0.2 N NaOH for 5 minutes at room temperature, then washing the well withanother 50 μl of 0.2 N NaCH. The plate was then washed three times withTNTw. Incorporation of biotinylated ddNTPs was measured by anenzyme-linked assay. Each well was incubated with 20 μl ofstreptavidin-conjugated horseradish peroxidase ({fraction (1/1000)}dilution in TNTw of product purchased from BRL, Gaithersburg, Md.) withagitation for 30 minutes at room temperature. After washing 5 times withTNTw, 100 μl of o-phenylenediamine (OPD, 1 mg/ml in 0.1 M citric acid,pH 4.5) (BRL) containing 0.012% H₂0₂ was added to each well. The amountof bound enzyme was determined kinetically with a Molecular Devicesmodel “Vmax” 96-well spectrophotometer. FIGS. 8A and 8B illustrate howhorse parentage data appears at the microtiter plate level. In standardhorse parentage testing, samples are arrayed 85 to a plate (columns1-11) plus controls (column 12). For each horse locus the presence ofthe two known alleles is determined by base specific interrogation onseparate plates. The two plates shown in FIGS. 8A and 8B are identicalin PCR template and GBA primer and differ only in the biotinylated ddNTPthat was used in the extension reaction (biotin-ddCTP in FIG. 8A andbiotin-ddTTP in FIG. 8B). Upon addition of the colorimetric reagent(OPD), the absorbance of the resultant color was measured in a MolecularDevices microtiter plate reader and the raw data generated inmilliOD/min per well. The two raw data gray scale representations of theabsorbance data for these plates are shown in the figures arranged inthe exact same order as on the microtiter plates. Gray scale intensitycorrelates directly with color production. At this biallelic locus thebases detected are C (FIG. 8A) and T (FIG. 8B). Approximately 40% ofhorses tested to date are heterozygotes (the sample in well A1, forexample) and the remaining homozygous for C (A2, for example) or T (B3,for example). Synthetic template controls include a control C homozygote(well E12), a control T homozygots (well F12) and a control heterozygote(well G12). Scale refers to milliOD/min at 450 nm. Most positive sampleshad signals above 100 in this case. In this format, for a 28 biallelicmarker panel horse parentage test, 56 such plates would be required forcomplete typing of the 85 horses.

[0227] Fifty-one random, unrelated horses and three sire/dam/foalfamilies were chosen for study in order to establish that a reasonablesubset of the group of DNA markers found to date was likely to providethe desired p(exc)≧0.90, and to assess the power of the DNA markersthereby allowing them to be prioritized for definitive allelic frequencymeasurements.

[0228] PCR generated single-stranded template DNA was prepared from thegenomic DNA of each animal. This material was typed with respect tonucleotide variants using GBA. The genotype data obtained for eachpolymorphic site is summarized in Table 2. From this genotype data,allelic frequencies were determined and used to calculate the p(exc) ofeach site. The cumulative p(exc) is given for the group of 18 siteslisted in Tables 1 and 2 is 0.955 for the group. In Tables 2-5, thegenotype is indicated as either homozygote (i.e. PP or QQ) or theheterozygote (PQ). The numbers In parentheses denote the number ofalleles of the genotype observed. TABLE 2 cum Genotype 1 Genotype 2Genotype 3 p(non- p(non- cum LOCUS PP (#) PQ (#) QQ (#) p q p(exc) exc)exc) p(exc) 324-1 CC (11) CT (30) TT (19) 0.433 0.567 0.185 0.815 0.8150.185 324-2 CC (21) CT (24) TT (9) 0.611 0.389 0.181 0.819 0.667 0.333459-1 AA (5) AC (22) CC (31) 0.276 0.724 0.160 0.840 0.560 0.440 459-2CC (53) CG (6) GG (0) 0.949 0.051 0.046 0.954 0.535 0.465 474-1 AA (35)AT (21) TT (4) 0.758 0.242 0.150 0.850 0.453 0.547 178-1 AA (38) AG (16)GG (4) 0.793 0.207 0.137 0.863 0.391 0.609 092-2 AA (13) AG (28) GG (17)0.466 0.534 0.187 0.813 0.318 0.682 177-1 AA (2) AC (12) CC (46) 0.1330.867 0.102 0.898 0.285 0.715 177-2 CC (18) CT (23) TT (18) 0.500 0.5000.188 0.813 0.232 0.768 595-3 AA (14) AG (28) GG (11) 0.528 0.472 0.1870.813 0.189 0.811 177-3 AA (26) AG (25) GG (9) 0.642 0.358 0.177 0.8230.155 0.845 595-2 GG (34) GT (13) TT (3) 0.810 0.190 0.130 0.870 0.1350.865 595-1 AA (25) AG (21) GG (5) 0.696 0.304 0.167 0.833 0.113 0.887085-1 CC (32) CG (24) GG (4) 0.733 0.267 0.157 0.843 0.095 0.905 129-1AA (7) AT (33) TT (20) 0.392 0.608 0.181 0.819 0.078 0.922 007-1 AA (22)CG (29) GG (9) 0.608 0.392 0.181 0.819 0.064 0.936 007-2 AA (3) AG (25)GG (31) 0.263 0.737 0.156 0.844 0.054 0.946 007-3 AA (27) AG (32) GG (1)0.717 0.283 0.162 0.838 0.045 0.955

EXAMPLE 4

[0229] Parentage Testing

[0230] A family consisting of a sire, dam and offspring was typed withrespect to the 18 variable sites discussed above with no exclusionsfound. This family had not been previously blood typed. Using thepreliminary allelic frequency numbers given in Table 2, it is possibleto construct a p(exc) table pertaining to this specific case (Table 3).In general, this Table is constructed assuming that the identity of thedam is not in question (although in practice, it is possible to excludethe mare if neither of her alleles is inherited by the foal). Table 3shows the typing data for the foal and its dam with the sites testedlisted in order of informativeness in this case. The overall cum p(exc)using 18 loci was 0.942. TABLE 3 EXCL'DED p(non- cum p(non- LOCUS FOALDAM SIRES p(exc) exc) exc) cum p(exc) 459-1 AC CC AA 0.524 0.476 0.4760.524 129-1 AA AT TT 0.370 0.630 0.300 0.700 324-1 CC CT TT 0.321 0.6790.204 0.796 595-3 GG GG AA 0.279 0.721 0.147 0.853 090-2 GG AG AA 0.2170.783 0.115 0.885 324-2 CC CT TT 0.151 0.849 0.098 0.902 595-1 AA AA GG0.092 0.818 0.080 0.920 007-3 AA AA GG 0.080 0.920 0.073 0.927 085-1 CCCC GG 0.071 0.929 0.068 0.932 474-1 AA AA TT 0.059 0.941 0.064 0.936178-1 AA AG GG 0.043 0.957 0.061 0.939 595-2 GG GG TT 0.036 0.964 0.0590.941 177-1 CC CC AA 0.018 0.982 0.058 0.942 459-2 CC CC GG 0.003 0.9970.058 0.942 007-1 CG CG — 0.000 1.000 0.058 0.942 007-2 AG AG — 0.0001.000 0.058 0.942 177-2 CT CT — 0.000 1.000 0.058 0.942 177-3 AG AG —0.000 1.000 0.058 0.942

EXAMPLE 5

[0231] Identity Testing

[0232] It is of interest to make use of the population analysis group toderive preliminary information concerning other aspects of the markerpanel. For example, using the allelic frequency data, it is possible tocalculate a probability of identity [p(ID)] value for the 18 sites whichis equal to 4.79×10⁻⁷ or approximately 1 in 2.1 million. Thus, one wouldpredict that none of the horses examined in the population group wouldhave the same genotype and computer analysis of the genotype databaserevealed this to be the case. As shown in Table 4, the p(ID) reachesvery small numbers with analysis of comparatively few loci. Using thetop seven sites, the probability of two random animals having differentgenotypes is already 99.9%. TABLE 4 GENOTYPE GENOTYPE GENOTYPE 1 2 3 cumLOCUS PP (#) PQ (#) QQ (#) p q p(ID) p(ID) 177-2 CC (18) CT (23) TT (18)0.500 0.500 0.375 0.375 595-3 AA (14) AG (28) GG (11) 0.528 0.472 0.3760.141 090-2 AA (13) AG (28) GG (17) 0.466 0.534 0.376 0.053 324-1 CC(11) CT (30) TT (19) 0.433 0.567 0.380 0.020 129-1 AA (7) AT (33) TT(20) 0.392 0.608 0.388 0.008 007-1 AA (22) CG (29) GG (9) 0.608 0.3920.388 0.003 324-2 CC (21) CT (24) TT (9) 0.611 0.389 0.388 0.001 177-3AA (26) AG (25) GG (9) 0.642 0.358 0.397 4.67 × 10⁻⁴ 595-1 AA (25) AG(21) GG (5) 0.696 0.304 0.422 1.97 × 10⁻⁴ 007-3 AA (27) AG (32) GG (1)0.717 0.283 0.435 8.57 × 10⁻⁴ 459-1 AA (5) AC (22) CC (31) 0.276 0.7240.440 3.77 × 10⁻⁵ 085-1 CC (32) CG (24) GG (4) 0.733 0.267 0.447 1.68 ×10⁻⁵ 007-2 AA (3) AG (25) GG (31) 0.263 0.737 0.450 7.58 × 10⁻⁶ 474-1 AA(35) AT (21) TT (4) 0.758 0.242 0.468 3.55 × 10⁻⁶ 178-1 AA (38) AG (16)GG (4) 0.793 0.207 0.505 1.79 × 10⁻⁶ 595-2 GG (34) GT (13) TT (3) 0.8100.190 0.527 9.45 × 10⁻⁷ 177-1 AA (2) AC (12) CC (46) 0.133 0.867 0.6185.84 × 10⁻⁷ 459-2 CC (53) CG (6) GG (0) 0.949 0.051 0.821 4.79 × 10⁻⁷

[0233] False Report Rate

[0234] In the current study, two types of potential false reports can beencountered due to either (1) PCR failures or (2) incompatibilitybetween the genotype obtained on opposite strands. Only data from thoseanimals which had been successfully typed in both strands was includedin the allelic frequency calculations. Sixty horses typed with respectto 18 sites amounts to 1,080 genotypings. 95% of all typing experimentswere successful overall. No typing errors were due to traditional PCRfailures. 3.8% false reports were encountered at the GBA step eitherbecause the PCR was unsuccessful at the single strand step or due tooperator error. 1.1% of all typings produced incompatible data betweenthe strands for unknown reasons.

[0235] In sum, the GBA (genetic bit analysis) method is thus a simple,convenient, and automatable method for interrogating SNPs. In thismethod, sequence-specific annealing to a solid phase-bound primer isused to select a unique polymorphic site in a nucleic acid sample, andinterrogation of this site is via a highly accurate DNA polymerasereaction using a set of novel non-radioactive dideoxynucleotide analogs.One of the most attractive features of the GBA approach is that, becausethe actual allelic discrimination is carried out by the DNA polymerase,one set of reaction conditions can be used to interrogate many differentpolymorphic loci. This feature permits cost reductions in complex DNAtests by exploitation of parallel formats and provides for rapiddevelopment of new tests.

[0236] The intrinsic error rate of the GBA procedure in its presentformat is believed to be low; the signal-to-noise ratio in terms ofcorrect vs. incorrect nucleotide incorporation for homozygotes appearsto be approximately 20:1. GBA is thus sufficiently quantitative to allowthe reliable detection of heterozygotes in genotyping studies. Thepresence in the DNA polymerase-mediated extension reaction of all fourdideoxynucleoside triphosphates as the sole nucleotide substratesheightens the fidelity of genotype determinations by suppressingmisincorporation. GBA can be used in any application where pointmutation analyses are presently employed—including genetic mapping andlinkage studies, genetic diagnoses, and identity/paternitytesting—assuming that the surrounding DNA sequence is known.

EXAMPLE 6

[0237] Analysis of a Human SNP

[0238] Human single nucleotide polymorphisms may be used in the samemanner as the above-described equine polymorphisms. Examples of suitablehuman polymorphisms are presented in Table 5. TABLE 5 EXAMPLES OF HUMANSINGLE NUCLEOTIDE POLYMORPHISMS SNP SNP SEQ ID ALLELE ALLELE SEQ IDLOCUS LOCATION NO. 5′ PROXIMAL SEQUENCE 1 2 3′ DISTAL SEQUENCE NO. IGKC2p12 73 AAAGCAGACTACGAGAAACACAAA G C TCTACGCCTGCGAAGTCACCCATC 74 75GATGGGTGACTTCGCAGGCGTAGA C G TTTGTGTTTCTCGTAGTCTGCTTT 76 ILIB 2q3-q21 77CTCCTGCAATTGACAGAGAGCTCC C T GAGGCAGAGAACAGCACCCAAGGT 78 79ACCTTGGGTGCTGTTCTCTGCCTC G A GGAGCTCTCTGTCAATTGCAGGAG 80 LRLR 19p13.3 81CTCCATCTCAAGCATCGATGTCAA T C GGGGGCAACCGGAAGACCATCTTG 82 83CAAGATGGTCTTCCGGTTGCCCCC A G TTGACATCGATGCTTGAGATGGAG 84 MET-H 7q31 85GTTTGGTCTAAGTTGCTGATTACC A G GGATTTTTCTGACGATCTTTCAAC 86 87GTTGAAAGATCGTCAGAAAAATCC T C GGTAATCAGCAACTTAGACCAAAC 88 PROC 2q13-q2189 GCTGACAGCGGCCCACTGCATGGA T C GAGTCCAAGAAGCTCCTTGTCAGG 90 91CCTGACAAGGAGCTTCTTGGACTC A G TCCATGCAGTGGGCCGCTGTCAGC 92

[0239] For the purpose of validating the strategy of converting humanSNPs to a GBA test format, a phenotypically neutral SNP site wasconverted and tested by GBA. This site was selected from the JohnsHopkins University OMB database of human polymorphisms. The site ismet-H on chromosome 7 at q31, mutation position 127, A to G (Horn, G. T.et al., Clin. Chem. 36, 1614-1619, 1990). The following oligonucleotideswere synthesized (p=phosphorothioate):

[0240] PCR primer no. 1552 (SEQ ID NO: 93)

[0241] 5′-CpApTpCpCATGTAGGAGAGCCTTAGTC

[0242] PCR primer no. 1553 (SEQ ID NO: 94)

[0243] 5′-CCATTTTTGTGTCTTCTAGTCTAAGG

[0244] GBA primer no. 1554 (SEQ ID NO: 95)

[0245] 5′-TTGAAAGATCGTCAGAAAAATCC

[0246] Human DNA samples were randomly selected from the DNA archives oftwo families available from the Centre D'Etude du Polymorphisme Humaine(CEPH) family collection. A negative control, containing no DNA was alsoused. Sample DNAs were amplified by PCR using the above primers and theresulting product was analyzed by GBA for two potential bases at thepolymorphic site, G and A. GBA results were obtained by an endpointreading of absorbance at 450 nm in a microtiter plate reader. The datais presented in Table 6.

[0247] Samples 1, 2, 4, 6 and 8 were homozygous for A, samples 7 and 9were homozygous for G and samples 3 and 5 were GA heterozygotes. TheseDNAs have not been tested for this biallelism by any other method todate. TABLE 6 Adsorption at Sample CEPH DNA A₄₅₀ No. No. Base G Base AGenotype 1 1333-10 .100 .556 AA 2 1333-02 .084 .782 AA 3 1333-04 .372.369 GA 4 1333-05 .081 .905 AA 5 1333-07 .321 .346 GA 6 1333-08 .084.803 AA 7 1340-09 .675 .092 GG 8 1340-10 .084 .756 AA 9 1340-12 .537.096 GG No DNA N/A .076 .097 N/A

[0248] False Report Rate

[0249] In the current study, two types of potential false reports can beencountered due to either (1) PCR failures or (2) incompatibilitybetween the genotype obtained on opposite strands. Only data from thoseanimals which had been successfully typed in both strands was includedin the allelic frequency calculations. Sixty horses typed with respectto 18 sites amounts to 1,080 typings. 95% of all typing experiments weresuccessful overall. No typing errors were due to traditional PCRfailures. 3.8% false reports were encountered at the GBA step eitherbecause the PCR was unsuccessful at the single strand step or due tooperator error. 1.1% of all typings produced incompatible data betweenthe strands for unknown reasons.

[0250] In sum, the GBA (genetic bit analysis) method is a simple,convenient, and automatable method for interrogating SNPs. In thismethod, sequence-specific annealing to a solid phase-bound primer isused to select a unique polymorphic site in a nucleic acid sample, andinterrogation of this site is via a highly accurate DNA polymerasereaction using a set of novel non-radioactive dideoxynucleotide analogs.One of the most attractive features of the GBA approach is that, becausethe actual allelic discrimination is carried out by the DNA polymerase,one set of reaction conditions can be used to interrogate many differentpolymorphic loci. This feature permits cost reductions in complex DNAtests by exploitation of parallel formats and provides for rapiddevelopment of new tests.

[0251] The intrinsic error rate of the GBA procedure in its presentformat is believed to be low; the signal-to-noise ratio in terms ofcorrect vs. incorrect nucleotide incorporation for homozygotes appearsto be approximately 20:1. GBA is thus sufficiently quantitative to allowthe reliable detection of heterozygotes in genotyping studies. Thepresence in the DNA polymerase-mediated extension reaction of all fourdideoxynucleoside triphosphates as the sole nucleotide substratesheightens the fidelity of genotype determinations by suppressingmisincorporation. GBA can be used in any application where pointmutation analyses are presently employed—including genetic mapping andlinkage studies, genetic diagnoses, and identity/paternitytesting—assuming that the local surrounding DNA sequence is known.

[0252] While the invention has been described in connection withspecific embodiments thereof, it will be understood that it is capableof further modifications and this application is intended to cover anyvariations, uses, or adaptations of the invention following, in general,the principles of the invention and including such departures from thepresent disclosure as come within known or customary practice within theart to which the invention pertains and as may be applied to theessential features hereinbefore set forth and as follows in the scope ofthe appended claims.

1 95 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 177-2 1 GCAGCTCTAA GTGCTGTGGG 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 177-2 2 TGCAGAAATTCTAAGGTGTT 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 177-2 3 AACACCTTAG AATTTCTGCA 20 20 base pairs nucleicacid single linear DNA (genomic) NO NO Equus caballus 177-2 4 CCCACAGCACTTAGAGCTGC 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 595-3 5 AGCTCTGGGA TGATCCACTA 20 20 base pairs nucleicacid single linear DNA (genomic) NO NO Equus caballus 595-3 6 TGAGGGAAAAATGATGATGC 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 595-3 7 GCATCATCAT TTTTCCCTCA 20 20 base pairs nucleicacid single linear DNA (genomic) NO NO Equus caballus 595-3 8 TAGTGGATCATCCCAGAGCT 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 090-2 9 AAAACTAATT TGATGGCCAT 20 20 base pairs nucleicacid single linear DNA (genomic) NO NO Equus caballus 090-2 10AAAGTCAGAA CAATGATTGC 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 090-2 11 GCAATCATTG TTCTGACTTT 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus090-2 12 ATGGCCATCA AATTAGTTTT 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 324-1 13 CACAAGGCCC AAGAACAGGA20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 324-1 14 TGAGTTCAGC GAGTGTCAGA 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 324-1 15 TCTGACACTCGCTGAACTCA 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 324-1 16 TCCTGTTCTT GGGCCTTGTG 20 20 base pairsnucleic acid single linear DNA (genomic) NO NO Equus caballus 129-1 17TGGGAAAGAC CACATTATTT 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 129-1 18 GTTCCCTTTT GTTTCAGACC 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus129-1 19 GGTCTGAAAC AAAAGGGAAC 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 129-1 20 AAATAATGTG GTCTTTCCCA20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 007-1 21 CATGAGTAAG AAGCATCCGG 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 007-1 22 CCATGGAGTCATAGATAAGT 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 007-1 23 ACTTATCTAT GACTCCATGG 20 20 base pairsnucleic acid single linear DNA (genomic) NO NO Equus caballus 007-1 24CCGGATGCTT CTTACTCATG 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 324-2 25 CCCAAGAACA GGATTGAGTT 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus324-2 26 AGCGAGTGTC AGAGTTGTGT 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 324-2 27 ACACAACTCT GACACTCGCT20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 324-2 28 AACTCAATCC TGTTCTTGGG 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 177-3 29 AGCAAGAAATGGGGGGCCTT 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 177-3 30 GTCCTACAAT TGCCAGGAAG 20 20 base pairsnucleic acid single linear DNA (genomic) NO NO Equus caballus 177-3 31CTTCCTGGCA ATTGTAGGAC 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 177-3 32 AAGGCCCCCC ATTTCTTGCT 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus595-1 33 GAATATCAAT ATATATATAT 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 595-1 34 TGTGTGTGTG TGTATTTGCT20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 595-1 35 AGCAAATACA CACACACACA 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 595-1 36 ATATATATATATTGATATTC 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 007-3 37 GCCATAATTA AGCCTGTATT 20 20 base pairsnucleic acid single linear DNA (genomic) NO NO Equus caballus 007-3 38GTTTGTTTTA AATTTTGTGA 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 007-3 39 TCACAAAATT TAAAACAAAC 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus007-3 40 AATACAGGCT TAATTATGGC 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 459-1 41 GTGTAGAGTA GTTCAAGGAC20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 459-1 42 ATGTCTTATA CCTCCCTTTT 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 459-1 43 AAAAGGGAGGTATAAGACAT 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 459-1 44 GTCCTTGAAC TACTCTACAC 20 20 base pairsnucleic acid single linear DNA (genomic) NO NO Equus caballus 085-1 45GTGAACGGAG AGCAGGCCTT 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 085-1 46 CCTGCTGAAG CCTCAGACCG 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus085-1 47 CGGTCTGAGG CTTCAGCAGG 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 085-1 48 AAGGCCTGCT CTCCGTTCAC20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 007-2 49 CTGCTCTTTA GACTATGACC 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 007-2 50 TCAACCTTGCATCATGAGCT 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 007-2 51 AGCTCATGAT GCAAGGTTGA 20 20 base pairsnucleic acid single linear DNA (genomic) NO NO Equus caballus 007-2 52GGTCATAGTC TAAAGAGCAG 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 474-1 53 TTTGAGCTGG GACCTCAGTC 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus474-1 54 TCTCCTGCCT TTAGACTCGA 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 474-1 55 TCGAGTCTAA AGGCAGGAGA20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 474-1 56 GACTGAGGTC CCAGCTCAAA 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 178-1 57 GAACCTCTGGGCCGTGGATA 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 178-1 58 TTGTTCAGAA GCACAGGTGA 20 20 base pairsnucleic acid single linear DNA (genomic) NO NO Equus caballus 178-1 59TCACCTGTGC TTCTGAACAA 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 178-1 60 TATCCACGGC CCAGAGGTTC 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus595-2 61 GTATTTGCTA GCTCTGGGAT 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 595-2 62 ATCCACTAAT GAGGGAAAAA20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 595-2 63 TTTTTCCCTC ATTAGTGGAT 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 595-2 64 ATCCCAGAGCTAGCAAATAC 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 177-1 65 GAAGTTGTGG GACAGATGTG 20 20 base pairsnucleic acid single linear DNA (genomic) NO NO Equus caballus 177-1 66AGAGATGCAG CTCTAAGTGC 20 20 base pairs nucleic acid single linear DNA(genomic) NO NO Equus caballus 177-1 67 GCACTTAGAG CTGCATCTCT 20 20 basepairs nucleic acid single linear DNA (genomic) NO NO Equus caballus177-1 68 CACATCTGTC CCACAACTTC 20 20 base pairs nucleic acid singlelinear DNA (genomic) NO NO Equus caballus 459-2 69 CCATGAGGAA GCCTCCACAA20 20 base pairs nucleic acid single linear DNA (genomic) NO NO Equuscaballus 459-2 70 GTCCCAATAG TCTGGGATTC 20 20 base pairs nucleic acidsingle linear DNA (genomic) NO NO Equus caballus 459-2 71 GAATCCCAGACTATTGGGAC 20 20 base pairs nucleic acid single linear DNA (genomic) NONO Equus caballus 459-2 72 TTGTGGAGGC TTCCTCATGG 20 24 base pairsnucleic acid single linear DNA (genomic) NO NO Homo sapiens IGKC 2p12 73AAAGCAGACT ACGAGAAACA CAAA 24 24 base pairs nucleic acid single linearDNA (genomic) NO NO Homo sapiens IGKC 2p12 74 TCTACGCCTG CGAAGTCACC CATC24 24 base pairs nucleic acid single linear DNA (genomic) NO NO Homosapiens IGKC 2p12 75 GATGGGTGAC TTCGCAGGCG TAGA 24 24 base pairs nucleicacid single linear DNA (genomic) NO NO Homo sapiens IGKC 2p12 76TTTGTGTTTC TCGTAGTCTG CTTT 24 24 base pairs nucleic acid single linearDNA (genomic) NO NO Homo sapiens ILIB 2q3-q21 77 CTCCTGCAAT TGACAGAGAGCTCC 24 24 base pairs nucleic acid single linear DNA (genomic) NO NOHomo sapiens ILIB 2q3-q21 78 GAGGCAGAGA ACAGCACCCA AGGT 24 24 base pairsnucleic acid single linear DNA (genomic) NO NO Homo sapiens ILIB 2q3-q2179 ACCTTGGGTG CTGTTCTCTG CCTC 24 24 base pairs nucleic acid singlelinear DNA (genomic) NO NO Homo sapiens ILIB 2q3-q21 80 GGAGCTCTCTGTCAATTGCA GGAG 24 24 base pairs nucleic acid single linear DNA(genomic) NO NO Homo sapiens LDLR 19p13.3 81 CTCCATCTCA AGCATCGATG TCAA24 24 base pairs nucleic acid single linear DNA (genomic) NO NO Homosapiens LDLR 19p13.3 82 GGGGGCAACC GGAAGACCAT CTTG 24 24 base pairsnucleic acid single linear DNA (genomic) NO NO Homo sapiens LDLR 19p13.383 CAAGATGGTC TTCCGGTTGC CCCC 24 24 base pairs nucleic acid singlelinear DNA (genomic) NO NO Homo sapiens LDLR 19p13.3 84 TTGACATCGATGCTTGAGAT GGAG 24 24 base pairs nucleic acid single linear DNA(genomic) NO NO Homo sapiens MET-H 7q31 85 GTTTGGTCTA AGTTGCTGAT TACC 2424 base pairs nucleic acid single linear DNA (genomic) NO NO Homosapiens MET-H 7q31 86 GGATTTTTCT GACGATCTTT CAAC 24 24 base pairsnucleic acid single linear DNA (genomic) NO NO Homo sapiens MET-H 7q3187 GTTGAAAGAT CGTCAGAAAA ATCC 24 24 base pairs nucleic acid singlelinear DNA (genomic) NO NO Homo sapiens MET-H 7q31 88 GGTAATCAGCAACTTAGACC AAAC 24 24 base pairs nucleic acid single linear DNA(genomic) NO NO Homo sapiens PROC 2q13-q21 89 GCTGACAGCG GCCCACTGCA TGGA24 24 base pairs nucleic acid single linear DNA (genomic) NO NO Homosapiens PROC 2q13-q21 90 GAGTCCAAGA AGCTCCTTGT CAGG 24 24 base pairsnucleic acid single linear DNA (genomic) NO NO Homo sapiens PROC2q13-q21 91 CCTGACAAGG AGCTTCTTGG ACTC 24 24 base pairs nucleic acidsingle linear DNA (genomic) NO NO Homo sapiens PROC 2q13-q21 92TCCATGCAGT GGGCCGCTGT CAGC 24 24 base pairs nucleic acid single linearDNA (genomic) NO NO Homo sapiens MET-H 7q31 93 CATCCATGTA GGAGAGCCTTAGTC 24 26 base pairs nucleic acid single linear DNA (genomic) NO NOHomo sapiens MET-H 7q31 94 CCATTTTTGT GTCTTCTAGT CTAAGG 26 23 base pairsnucleic acid single linear DNA (genomic) NO NO Homo sapiens MET-H 7q3195 TTGAAAGATC GTCAGAAAAA TCC 23

What is claimed is:
 1. A nucleic acid molecule: (i) having a nucleotidesequence capable of specifically hybridizing to the invariant proximalor invariant distal nucleotide sequence of a single nucleotidepolymorphism, and (ii) being used to specifically detect the singlenucleotide polymorphic site (X) of the single nucleotide polymorphism.2. The nucleic acid molecule of claim 1, wherein said mammal is selectedfrom the group consisting of humans, non-human primates, dogs, cats,cattle, sheep, poultry, and horses.
 3. The nucleic acid molecule ofclaim 2, wherein said mammal is a horse.
 4. The nucleic acid molecule ofclaim 3, wherein said molecule has a nucleotide sequence selected fromthe group consisting of SEQ ID NO: (2n+1), wherein n is an integerselected from the group consisting of 0 through
 35. 5. The nucleic acidmolecule of claim 3, wherein the sequence of said immediately 3′-distalsegment includes a sequence selected from the group consisting of SEQ IDNO: (2n+2), wherein n is an integer selected from the group consistingof 0 through
 35. 6. A nucleic acid molecule having a sequencecomplementary to a sequence selected from the group consisting of SEQ IDNO: 1 through SEQ ID NO: 72 in Table
 1. 7. A set of at least two of thenucleic acid molecules of claim
 6. 8. A set of at least two nucleic acidmolecules, wherein at least one of said nucleic acid molecules has asequence complementary to a sequence selected from the group consistingof SEQ ID NO: 1 through SEQ ID NO:
 72. 9. A method for determining theextent of genetic similarity between DNA of a target horse and DNA of areference horse, which comprises the steps: A) determining, for a singlenucleotide polymorphism of said target horse, and for a correspondingsingle nucleotide polymorphism of said reference horse, whether saidpolymorphisms contain the same single nucleotide at their respectivepolymorphic sites; and B) using said comparison to determine the extentof genetic similarity between said target horse and said referencehorse.
 10. The method of claim 9, wherein said polymorphic sites have(1) an immediately 5′-proximal sequence selected from the groupconsisting of SEQ ID NO: (2n+1), and (2) an immediately 3′-distalsequence selected from the group consisting of SEQ ID NO: (2n+2);wherein n is an integer selected from the group consisting of 0 through35.
 11. The method of claim 9, wherein in step A, said determination issufficient to establish that said target horse and said reference horseare not the same animal.
 12. The method of claim 9, wherein in step A,said determination is sufficient to establish that said reference horseis not a parent of said target horse.
 13. The method of claim 9, whereinin step A, said reference horse has a trait, and said determination issufficient to establish that said target horse also has said trait. 14.The method of claim 9, wherein in step A, said reference horse has afirst and second trait, and said determination is sufficient toestablish a genetic linkage between said traits.
 15. The method of claim9, wherein in step A, said determination is accomplished by a methodhaving the sub-steps: (a) incubating a sample of nucleic acid containingsaid single nucleotide polymorphism of said target horse, or said singlenucleotide polymorphism of said reference horse, in the presence of anucleic acid primer and at least one dideoxynucleotide derivative, underconditions sufficient to permit a polymerase mediated,template-dependent extension of said primer, said extension causing theincorporation of a single dideoxynucleotide to the 3′-terminus of saidprimer, said single dideoxynucleotide being complementary to the singlenucleotide of the polymorphic site of said polymorphism; (b) permittingsaid template-dependent extension of said primer molecule, and saidincorporation of said single dideoxynucleotide; and (c) determining theidentity of the nucleotide incorporated into said polymorphic site, saididentified nucleotide being complimentary to said nucleotide of saidpolymorphic site.
 16. The method of claim 15, wherein in substep (a),said primer is immobilized to a solid support, and wherein in sub-step(b), said template-dependent extension of said primer is conducted onsaid immobilized primer.
 17. The method of claim 15, wherein, insub-step (a), said sample is processed to amplify a nucleic acidcontaining said polymorphism prior to said incubation.
 18. The method ofclaim 15, wherein substep (a) additionally includes using a non-invasiveswab to collect said sample of DNA from said horse.
 19. The method ofclaim 15, wherein in substep (a), said polymerase mediated,template-dependent extension of said primer is conducted in the presenceof at least two dideoxynucleotide triphosphate derivatives selected fromthe group consisting of ddATP, ddTTP, ddCTP and ddGTP, but in theabsence of dATP, dTTP, dCTP and dGTP.
 20. A method for determining theprobability that a target horse will have a particular trait, whichcomprises the steps: A) determining the identity of a single nucleotidepresent at a 1-if 15 polymorphic site of an equine single nucleotidepolymorphism, and being present in more than 51% of a set of referencehorses; B) determining whether a single nucleotide present at apolymorphic site of a corresponding single nucleotide polymorphism ofsaid target horse has the same identity as the single nucleotide presentat said polymorphic site of said 51% of reference horses exhibiting saidtrait; C) using said determination of step B to establish theprobability that said target horse will have said particular trait. 21.The method of claim 20, wherein said equine single nucleotidepolymorphism has (1) an immediately 5′-proximal sequence selected fromthe group consisting of SEQ ID NO: (2n+1); and (2) an immediately3′-distal sequence selected from the group consisting of SEQ ID NO:(2n+2); wherein n is an integer selected from the group consisting of 0through
 35. 22. The method of claim 20, wherein said trait is an equinegenetic disease.
 23. The method of claim 20, wherein said trait is anequine condition.
 24. The method of claim 20, wherein said trait is anequine characteristic.
 25. A method for creating a genetic map of uniquesequence equine polymorphisms which comprises the steps: A) identifyingat least one pair of inter-breeding reference horses, wherein each ofsaid pairs of horses is characterized by having a first and a secondreference horse, said first reference horse having: two alleles (i) and(ii), said alleles each being single nucleotide polymorphic alleleshaving a single nucleotide polymorphic site; said second reference horsehaving: a corresponding allele (i′) to said allele (i) of said firstreference horse, wherein said allele (i′) has a single nucleotidepolymorphic site, and wherein the single nucleotide present at saidpolymorphic site of said allele (i′) differs from the single nucleotidepresent at the polymorphic site of said allele (i) of said firstreference horse, and B) identifying in a progeny of at least one of saidpairs of inter-breeding reference horses the single nucleotide presentat a single nucleotide polymorphic site of a corresponding allele ofsaid alleles (i) and (i′), and the single nucleotide present at a singlenucleotide polymorphic site of a corresponding allele of said alleles(ii) and (ii′); and C) determining the extent of genetic linkage betweensaid alleles (i) and (ii), to thereby create said a genetic map.
 26. Themethod of claim 25, wherein said steps A, B and C are repeated at leastonce in cycle, to thereby create a genetic map having more than twopolymorphic sites.
 27. The method of claim 25, wherein at least one ofsaid alleles (i) and (ii) has (1) an immediately 5′-proximal sequenceselected from the group consisting of SEQ ID NO: (2n+1); and (2) animmediately 3′-distal sequence selected from the group consisting of SEQID NO: (2n+2); wherein n is an integer selected from the groupconsisting of 0 through
 35. 28. A method for predicting whether a targethorse will exhibit a predetermined trait which comprises the steps: A)identifying one or more alleles associated with said trait, each allelebeing a single nucleotide polymorphic allele having a single nucleotidepolymorphic site; B) determining for each of said single nucleotidepolymorphic alleles, a nucleotide present at said alleles polymorphicsite in a reference horse exhibiting said trait, to thereby define a setof single nucleotides at a set of polymorphic sites that are present ina reference horse exhibiting said trait; C) determining the identity ofsingle nucleotides present at corresponding single nucleotidepolymorphic alleles of said target horse; and D) comparing the identityof the single nucleotides present at the polymorphic sites of thepolymorphisms of said reference animal with the single nucleotidespresent at said corresponding single nucleotide polymorphic alleles ofsaid target horse.
 29. The method of claim 28, wherein at least one ofsaid polymorphisms has (1) an immediately 5′-proximal sequence selectedfrom the group consisting of SEQ ID NO: (2n+1); and (2) an immediately3′-distal sequence selected from the group consisting of SEQ ID NO:(2n+2); wherein n is an integer selected from the group consisting of 0through
 35. 30. A method for identifying a single nucleotide polymorphicsite which comprises: A) isolating a fragment of genomic DNA of areference organism; B) sequencing said fragment of DNA to therebydetermine the nucleotide sequence of a segment of said fragment, saidsegment being of a length sufficient to define the nucleotide sequenceof a pair of oligonucleotide primers capable of mediating the specificamplification of said fragment; C) using said oligonucleotide primers tomediate the specific amplification of DNA obtained from a plurality ofother organisms of the same species as said reference organism; and D)determining the nucleotide sequences of said amplified DNA molecules ofstep C, and comparing the sequence of said amplified molecules with thesequence of said fragment of said reference organism to thereby identifya single nucleotide polymorphic site.
 31. A method for interrogating apolymorphic region of a human single nucleotide polymorphism of a targethuman, said method comprising: A) selecting a known human singlenucleotide polymorphism for interrogation; B) identifying the sequenceof at least one oligonucleotide that flanks said selected singlenucleotide polymorphism; said identified sequence being of a lengthsufficient to permit the identification of primers capable of being usedto effect the specific amplification of said flanking oligonucleotideand said polymorphism; C) using said primers to effect the amplificationof said flanking oligonucleotide and said polymorphism of said singlenucleotide polymorphism of said target human; and D) interrogating thesingle nucleotide polymorphism of said amplified polymorphism by geneticbit analysis.